huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.15k stars 1.28k forks source link

Training on Teacher model logits #1975

Open Theodotus1243 opened 3 months ago

Theodotus1243 commented 3 months ago

Feature request

Is there a possibility to add training on bigger model logits It's a question of training on logits instead of one-hot vectors from dataset text

Motivation

DistillKit slows and Nvidia-Minitron slow good result of such technique

Your contribution

DistillKit https://github.com/arcee-ai/DistillKit Minitron https://arxiv.org/pdf/2408.11796

qgallouedec commented 2 months ago

Thanks for the suggestion. Do you have any reference code that implements this technique?

I also recommend not to duplicate your issue (It actually lowers your chances of getting an answer.)

qgallouedec commented 2 months ago

Do you have any reference code that implements this technique?

Oh sorry I just saw that you've already provided one. I see that it's using trl 👍. We welcome contributions, if anyone is interested in opening a PR, so that we can see what changes are needed to have this feature.

northern-64bit commented 2 months ago

Hi @qgallouedec, I would like to work on a PR for this 😄

qgallouedec commented 1 month ago

Thanks @northern-64bit, feel free to open a PR. More generally, when an issue is open, anyone is free to work on it.

northern-64bit commented 3 weeks ago

Thanks @northern-64bit, feel free to open a PR. More generally, when an issue is open, anyone is free to work on it.

Excellent, expect to see a PR in a week hopefully :)