hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

Some questions about running the scorer for arbitary model #5

Closed HelloWorldLTY closed 7 months ago

HelloWorldLTY commented 7 months ago

Hi, thanks for your great work! I notice that you used ChatGPT for scorer but it seems that there is no place for us to insert our own token. Does this mean we cannot use this scorer for arbitary model?

Moreover, do you think it can be used for a evaluation metric of llm output? Thanks.

VPeterV commented 7 months ago

Hi. Thanks for ur interest!

  1. ChatGPT is only used to generate samples to train our scorer rather than scoring SFT data directly. Since using ChatGPT is hard to scale up the dataset size. You can refer to this issue: https://github.com/hkust-nlp/deita/issues/3#issuecomment-1876974953
  2. Yes! For the quality scorer, it is kind of like reward models or evaluators. So I think it has the potential to be used for evaluation.
HelloWorldLTY commented 7 months ago

Thanks a lot!