ncsoft / offsetbias

Official implementation of "OffsetBias: Leveraging Debiased Data for Tuning Evaluators"
BSD 3-Clause "New" or "Revised" License
9 stars 0 forks source link

Request for Details on Base-data Generation and Single Scoring Data Training #1

Open Joe-Hall-Lee opened 1 month ago

Joe-Hall-Lee commented 1 month ago

Hi,

Thank you for your excellent work. I am very interested in the paper and am currently trying to reproduce your results.

The paper mentions that Base-data includes a subset of PKU-SafeRLHF, but it doesn't provide details on the selection criteria for this subset. I also noticed that you converted 3.1k scoring data points from Ultrafeedback into a pairwise preference format.

Could you please provide the scripts to generate Base-data? Alternatively, would it be possible to share the Base-data itself?

Furthermore, I am curious about the method you used to train the model using single scoring data. The paper does not provide a prompt template for this process. Could you please explain how you trained with single scoring data?

I would really appreciate it if you could provide the training code. It will be of great help to me.

Best regards.

parkjunsoo91 commented 1 month ago

Hi, thank you for your interest in our work.

The paper's description for Base-data is indeed little, because we thought that the specifics of the Base-data are too subtle and not the focus of the paper. Unfortunately it is currently difficult to share the exact processing code, nor the training code, as it involves libraries used internally in the company. Also I'm afraid we cannot share the exact data because of license issue. However I'll be happy to clarify the process for building Base-data.

For PKU-SafeRLHF, we used RLHFlow/PKU-SafeRLHF-30K-standard dataset.

For the single-scoring data, we have Helpsteer and Ultrafeedback:

The Helpsteer data is processed as follows:

The Ultrafeedback data was processed as follows:

For preparing 3.1k pairwise data from Ultrafeedback, we do the following:

Hope this helps!