Closed nickpotafiy closed 1 month ago
This is because quantized and unquantized models use different methods for attention, and I hadn't included a paged method for unquantized models in v0.1.0. It's in the dev branch now, so the dynamic generator should work with FP16 models too, and I think I'll release v0.1.1 soon to fix some other incoming issues too.
Thanks!
I released v0.1.1 now which should support FP16 models in the new generator.
Hey @turboderp, latest version does not load a non-quantized model. Possibly
q_handle
beingNone
does not sit well with that function call. Specifying0
avoids this error, but it still fails on that forward call. I could dig into the issue but you probably could fix it quicker.