Closed kkjh0723 closed 1 year ago
Thanks for reaching out! The only relevant field for training the model is num_example_per_prompt
, you can safely ignore the rest or fill in dummy values in their place.
For completeness:
user_id
is the id of the user that made the ranking. __index_level_0__
is an artifact from using HF datasets and can be removed.We wanted to keep as many fields as possible to enable different use cases.
@yuvalkirstain Thanks for quick reply. I have two following questions regarding your answers.
ranking_id
is still unclear to me. What is different from user_id
?num_example_per_prompt
? Thanks for the answer!
Thanks for sharing the code and dataset!
I want to train the model with private data, but I cannot understand some of the fields in the pick-a-pic dataset. Specifically, what is the meaning of the following fields?
ranking_id
,user_id
,num_example_per_prompt
,__index_level_0__
And are all those fields used to train the pickscore model? (it seems
num_example_per_prompt
is related to the inverse proportional weighting stated in the paper)