Closed chernobaevigor closed 1 year ago
While trying to launch inference on GPU the following error occurrs:
ValueError: For BatchMatMul, inputs shape cannot be broadcast on CPU/GPU, with x shape [const vector][1, 32, 131072, 2], y shape [const vector][2, 2]
It seems Ascend does the job automatically while for GPU some update required for LLAMA.
Proposed fix: Update mindformers/models/llama/llama_layer.py with:
class LlamaRotaryEmbedding(Cell): ... def construct(self, xq, ...): freqs_cos = ... rotary_mask = ...
++ mins_mask = F.tile(mins_mask, (1, xq.shape[1], 1, 1)) ++ rotary_mask = F.tile(rotary_mask, (1, xq.shape[1], 1, 1))
...
You need to think about updaing the shard strategy at the same time for the correspoding BatchMatmul operator.
BatchMatmul
@Vincent34 I've moved ticket to gitee
While trying to launch inference on GPU the following error occurrs:
It seems Ascend does the job automatically while for GPU some update required for LLAMA.
Proposed fix: Update mindformers/models/llama/llama_layer.py with:
class LlamaRotaryEmbedding(Cell): ... def construct(self, xq, ...): freqs_cos = ... rotary_mask = ...
++ mins_mask = F.tile(mins_mask, (1, xq.shape[1], 1, 1)) ++ rotary_mask = F.tile(rotary_mask, (1, xq.shape[1], 1, 1))
...