Closed CNXDZS closed 12 months ago
Thanks for your interest. I can understand your meaning. I would say I recommend using the data that share a similar distribution for the cherry data and pre-experienced data.
If you really don't want to train the pre-experienced model on the dataset you plan to select, like firefly. I think you can try to directly calculate the IFD scores on the base model you are trying to fine-tune. It might work just fine.
Hi authors,this project is great!I have some confusion and need your help. The Pre-Experienced Model(stage 3) I fine-tuned with a certain data could be used to filter other datasets?For example, I used the selected pre-experienced samples(stage 2) from alpaca_data to fined tune my pretrain model and obtained a Pre-Experienced Model,and then use this model to select cherry data from alpaca_data.But could I use this Pre-Experienced Model to filter cherry data from other datasets (such as firefly)? In other words,If I have to use the selected pre-experienced samples from other datasets(such as firefly), and then fine-tune my pretrain model to obtained a new Pre-Experienced Model? my english is poor,I don’t know if my description is clear or not..Thanks a lot!