Open purejomo opened 4 months ago
You can refer to this: https://github.com/JAYANDJEAN/From_Transformer_to_GPTs/blob/main/04_llama2/llama.py I use use_cache to control whether to use the cache, because we don't need to use the cache during training.
@JAYANDJEAN Thanks It means I can turn off caching by modifying codes?
Hello,
Could you please advise me on how to disable the KV cache? I would also appreciate any guidance on how to implement this change in code.
Thank you for your assistance.