Open tchaton opened 6 months ago
According to the paper,
In our experiments, we equip pre-existing LLMs—such as Llama 2 (Touvron et al., 2023) 7B, 13B, and 70B—with DMC by retrofitting them on a negligible percentage of the original pre-training data (~2% for 2× compression, and ~8% for 8× compression) and without adding any extra parameters to the original LLM.
The required data is large IMHO.
🚀 The feature, motivation and pitch
This paper might be of interest: https://arxiv.org/pdf/2403.09636.pdf
Alternatives
No response
Additional context
No response