Closed transcend-0 closed 5 months ago
@transcend-0 hey!
The issue was solved in #30068. You can install transformers from main
with the following line for the correct generation with assisted decoding:
!pip install --upgrade git+https://github.com/huggingface/transformers.git
@zucchini-nlp Thank you very much! 💛 But Issue 2. (_generationconfig of _draftmodel would be overwritten by _targetmodel) is not yet settled, which may be worth considering.
@transcend-0 Did not notice the second point about generation config.
I think overriding the generation config of draft model by target model is done for performance enhancement, so that the assistant model has the same generation logic as the target model. Maybe @gante had empirical evidence for that
Hey @transcend-0 👋
The matching of generation config in the assistant model is done to ensure the assistant sees the same flags, ideally causing equivalent distribution shifts on both models. For instance, if we set generate
to bias certain tokens, then we also want the assistant model to have the same bias (to maximize the matches) 🤗
@transcend-0 likewise, the assistant should generate greedily (see the PR above) :)
Thank you very much!
System Info
Python 3.10.11 transformers 4.40.0 torch 2.0.1 Linux version 4.15.0-55-generic x86_64
Who can help?
@ArthurZucker @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Checkpoint: vicuna-7b-v1.3, vicuna-68m
Outputs:
Expected behavior
Hello! Huggingface team!
There is a strange problem in Speculative Decoding (Assisted Decoding). Maybe there are some bugs in the implementation of the feature.