Closed KnutJaegersberg closed 11 months ago
cc @ArthurZucker @gante
Hey! Could you provide a full reproducer? past_key_values
should be supported as its required for fast generation using use_cache=True
!
Hey @KnutJaegersberg 👋
The root issue is that RWKV, being an RNN at its core, does not have a growing key-value cache (past_key_values
) that can be sliced. Alternatively, it has the state of the recurrent neural net, which is updated at each iteration of generation.
Since the implementation of CFG and contrastive search (and some other methods) rely on the ability to slice the cache to remove old data, there is no immediate solution for RWKV.
You probably can implement equivalent versions of these techniques for models that have a state (as opposed to a growing cache), by recomputing the RWKV state as needed :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.33.1Who can help?
@younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I got this error in the textgen-webui using HF transformers converted RWKV models, i.e. RWKV/rwkv-4-1b5-pile, but I think it is a HF tf issue, i.e. "TypeError: RwkvForCausalLM.forward() got an unexpected keyword argument 'past_key_values'". Perhaps it's not implemented yet? I used the options for CFG (and also contrastive search, but not in the same go), and for CFG, I got this error message: CFG just didn't work without error message.
Expected behavior
Generate some nice CFGded tokens. Also contrastive search tokens.