mdegans / weave

Branching story writing tool with generative AI
Other
1 stars 0 forks source link

crash on unsupported model #6

Closed mdegans closed 1 month ago

mdegans commented 3 months ago

When a model is unsupported (at least on Metal) an assert in the llama.cpp/ggml-metal.m code causes a crash. To fix this we need to find a way to check for supported models by a given backend and fail gracefully. This is a good candidate to add to llama.cpp itself since this is an issue with the library itself that we can't handle in our code without duplicating the potential fix.

mdegans commented 3 months ago

There doesn't seem to be a way to programmatically check. In ggml-metal.m there's a gigantic switch case for which the default case is to panic. This could be moved to a helper function which could be added to the public API. I'd rather not add the code to drama_llama since every time a kernel gets added, I'll have to make a change. As it is, updating bindings is a matter of updating the submodule, building, and fixing the odd breaking API change.

mdegans commented 3 months ago

This is fixed now in llama.cpp by this PR

Turns out there was a static function to check if an op was supported but this was returning true for bf16 on Metal when this is not implemented. As a result it was hitting the assert. The behavior now is for unsupported layers to be run on the CPU. This is slower, but should be faster if we increase the number of threads the drama_llama::Engine can use. Right now I think it defaults to 1. We can probably change that default to the number of virtual cpus or perhaps the number of performance cores if that can be determined easily.

mdegans commented 1 month ago

Fixed with version 0.0.3