huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
14.51k stars 826 forks source link

Codegemma-7b-instruct failure on Metal #2090

Open niklasha opened 2 months ago

niklasha commented 2 months ago

cargo run --features metal --example gemma -- --which code-7b-it --prompt "explain isakmpd's architecture" fails with:

retrieved the files in 27.197292ms
loaded the model in 36.859128625s
explain isakmpd's architectureError: Metal error Invalid matmul arguments [1296, 81, 9, 1] [36864, 256, 4096, 1] (9, 256, 9)

The prompt is not of great importance, other prompts just give different strides, but fails equally. I did look into this a bit, but I confess it sort of goes over my current competence. I thought the stride vector always should be decreasing, but the rhs stride info is, as can be seen [36864, 256, 4096, 1], which does not fit into my mental model. However the running with "--cpu" does accept this. I am still sceptic it does the math correctly, since it too seems to get the same striding, but it may be I that misunderstand the concept.

LaurentMazare commented 2 months ago

Thanks for reporting this, I think it's an issue that only happens on the 7b because of MQA (which is not present on the 2b version which was used for testing), could you give a try to #2091 , hopefully this should provide the appropriate fix.

niklasha commented 2 months ago

I have tested, and it does not crash anymore, thanks, and the output matches "--cpu". However the quality of the response to the example prompt is pretty low, subjectively. But that is not the key issue here I guess :-)

LaurentMazare commented 2 months ago

Glad that it helped. Did you make sure to respect the prompt format? This example is very barebone and doesn't do it for you. https://huggingface.co/blog/codegemma#prompt-format

niklasha commented 2 months ago

Aha! thanks, well I just was testing and did not do my homework. No I did not respect the prompt format :-)