project-baize / baize-chatbot

Let ChatGPT teach your own chatbot in hours with a single GPU!
https://arxiv.org/abs/2304.01196
GNU General Public License v3.0
3.15k stars 276 forks source link

SDF, how does it work? #45

Open qrdlgit opened 1 year ago

qrdlgit commented 1 year ago

I read the technical report, but there wasn't much info about the SDF. How does it work?

Is the intention to release a more detailed paper soon or are you folks considering keeping this as closed?

JetRunner commented 1 year ago

We'll release the code soon. It's actually very simple. Just ask ChatGPT to pick the best response and use that to fine-tune Baize.

qrdlgit commented 1 year ago

Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model.

Don't get me wrong, it's a great approach and lets people distill with much less contamination, but I think the title is a bit confusing and burying the lead.

JetRunner commented 1 year ago

Is that really self-distillation? It sounds more like synthetic data generation and you're still distilling ChatGPT into the model.

Don't get me wrong, it's a great approach and lets people distill with much less contamination, but I think the title is a bit confusing and burying the lead.

But for SDF, all the four responses are generated by the Baize model itself? ChatGPT only helps to choose which one to use. That's why we call it "self-distillation with feedback".

qrdlgit commented 1 year ago

Right, but it's the intelligence of ChatGPT you're distilling into your model.

If a child learning math gives 4 answers to a math question, and then a teacher tells him what's correct, you would not say the child is self-learning.

Note, this isn't to criticize. What you've done is hyper cool. I just think it might be made more clear.

And for what its worth, I regularly spam your project in a bunch of different places. You folks are doing some of the coolest things, imho.

JetRunner commented 1 year ago

Right, but it's the intelligence of ChatGPT you're distilling into your model.

If a child learning math gives 4 answers to a math question, and then a teacher tells him what's correct, you would not say the child is self-learning.

Note, this isn't to criticize. What you've done is hyper cool. I just think it might be made more clear.

None taken. You're right because that's indeed our motivation - besides SFT, to find another way to learn from ChatGPT. Also it can be substituted with human preferences. We'll think again about the name but we may not be able to update it due to EMNLP anonymity period. Thanks for your comments!