Closed SinanAkkoyun closed 1 month ago
This seems fine, but do note that the chat example uses the deprecated streaming generator which will be removed at some point (or replaced with a wrapper). Either way the speculative decoding performance is better in the dynamic generator so I don't think it makes too much sense to finetune it in the old generator.
Added
-dn
parameter toexamples/chat.py
to change the amount of drafted tokens better