LLAMA 7B standalone version cannot be launched on a GPU

chernobaevigor commented 1 year ago

While trying to launch inference on GPU the following error occurrs:

ValueError: For BatchMatMul, inputs shape cannot be broadcast on CPU/GPU, with x shape [const vector][1, 32, 131072, 2], y shape [const vector][2, 2]

It seems Ascend does the job automatically while for GPU some update required for LLAMA.

Proposed fix: Update mindformers/models/llama/llama_layer.py with:

class LlamaRotaryEmbedding(Cell): ... def construct(self, xq, ...): freqs_cos = ... rotary_mask = ...

++ mins_mask = F.tile(mins_mask, (1, xq.shape[1], 1, 1)) ++ rotary_mask = F.tile(rotary_mask, (1, xq.shape[1], 1, 1))

...

Vincent34 commented 1 year ago

You need to think about updaing the shard strategy at the same time for the correspoding BatchMatmul operator.

chernobaevigor commented 1 year ago

@Vincent34 I've moved ticket to gitee

mindspore-lab / mindformers