hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
502 stars 27 forks source link

Some questions about running the scorer for arbitary model #5

Closed HelloWorldLTY closed 10 months ago

HelloWorldLTY commented 10 months ago

Hi, thanks for your great work! I notice that you used ChatGPT for scorer but it seems that there is no place for us to insert our own token. Does this mean we cannot use this scorer for arbitary model?

Moreover, do you think it can be used for a evaluation metric of llm output? Thanks.

VPeterV commented 10 months ago

Hi. Thanks for ur interest!

  1. ChatGPT is only used to generate samples to train our scorer rather than scoring SFT data directly. Since using ChatGPT is hard to scale up the dataset size. You can refer to this issue: https://github.com/hkust-nlp/deita/issues/3#issuecomment-1876974953
  2. Yes! For the quality scorer, it is kind of like reward models or evaluators. So I think it has the potential to be used for evaluation.
HelloWorldLTY commented 10 months ago

Thanks a lot!