Open Theodotus1243 opened 3 months ago
Thanks for the suggestion. Do you have any reference code that implements this technique?
I also recommend not to duplicate your issue (It actually lowers your chances of getting an answer.)
Do you have any reference code that implements this technique?
Oh sorry I just saw that you've already provided one. I see that it's using trl 👍. We welcome contributions, if anyone is interested in opening a PR, so that we can see what changes are needed to have this feature.
Hi @qgallouedec, I would like to work on a PR for this 😄
Thanks @northern-64bit, feel free to open a PR. More generally, when an issue is open, anyone is free to work on it.
Thanks @northern-64bit, feel free to open a PR. More generally, when an issue is open, anyone is free to work on it.
Excellent, expect to see a PR in a week hopefully :)
Feature request
Is there a possibility to add training on bigger model logits It's a question of training on logits instead of one-hot vectors from dataset text
Motivation
DistillKit slows and Nvidia-Minitron slow good result of such technique
Your contribution
DistillKit https://github.com/arcee-ai/DistillKit Minitron https://arxiv.org/pdf/2408.11796