Open qrdlgit opened 1 year ago
We'll release the code soon. It's actually very simple. Just ask ChatGPT to pick the best response and use that to fine-tune Baize.
Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model.
Don't get me wrong, it's a great approach and lets people distill with much less contamination, but I think the title is a bit confusing and burying the lead.
Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model.
Don't get me wrong, it's a great approach and lets people distill with much less contamination, but I think the title is a bit confusing and burying the lead.
But for SDF, all the four responses are generated by the Baize model itself? ChatGPT only helps to choose which one to use. That's why we call it "self-distillation with feedback".
Right, but it's the intelligence of ChatGPT you're distilling into your model.
If a child learning math gives 4 answers to a math question, and then a teacher tells him what's correct, you would not say the child is self-learning.
Note, this isn't to criticize. What you've done is hyper cool. I just think it might be made more clear.
And for what its worth, I regularly spam your project in a bunch of different places. You folks are doing some of the coolest things, imho.
Right, but it's the intelligence of ChatGPT you're distilling into your model.
If a child learning math gives 4 answers to a math question, and then a teacher tells him what's correct, you would not say the child is self-learning.
Note, this isn't to criticize. What you've done is hyper cool. I just think it might be made more clear.
None taken. You're right because that's indeed our motivation - besides SFT, to find another way to learn from ChatGPT. Also it can be substituted with human preferences. We'll think again about the name but we may not be able to update it due to EMNLP anonymity period. Thanks for your comments!
I read the technical report, but there wasn't much info about the SDF. How does it work?
Is the intention to release a more detailed paper soon or are you folks considering keeping this as closed?