Open N3RDIUM opened 1 month ago
I've never used Quanto tbh, so not sure if I can help you here, but have you try excluding the lm_heads
?
Refering to the code snippet here, you could do exclude=lm_heads
?
Let me know if it helps
Thanks for your reply! Sadly, it still throws the same error.
Hey there. I'm trying to use
parler-tts
for near-realtime text to speech, just fast enough for conversations, on CPU inference. I'm trying to quantize your model in int8 using the following code:It gives me this error:
Can someone help me figure out what I'm doing wrong?