I noticed only llama and opt models are supported for FX tracing with KV Cache right now, can I check what is the plan to extend it to more models? Thanks!
Motivation
I would like run fx traced graphmodules for generate(), which uses KV Cache. Right now it works for OPT and LLama, but I would like try on more models.
Your contribution
If someone could point me to the general design pattern to make a model FX supported with KV cache or the lines of changes in modeling_opt.py or modeling_llama.py that made them work, I would be happy to submit PRs to make more models work.
Feature request
I noticed only llama and opt models are supported for FX tracing with KV Cache right now, can I check what is the plan to extend it to more models? Thanks!
Motivation
I would like run fx traced graphmodules for generate(), which uses KV Cache. Right now it works for OPT and LLama, but I would like try on more models.
Your contribution
If someone could point me to the general design pattern to make a model FX supported with KV cache or the lines of changes in modeling_opt.py or modeling_llama.py that made them work, I would be happy to submit PRs to make more models work.