Could the Pre-Experienced Model be used in other different dataset?

tianyi-lab / Cherry_LLM

[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models

306 stars 21 forks source link

Hi authors,this project is great!I have some confusion and need your help. The Pre-Experienced Model(stage 3) I fine-tuned with a certain data could be used to filter other datasets?For example, I used the selected pre-experienced samples(stage 2) from alpaca_data to fined tune my pretrain model and obtained a Pre-Experienced Model,and then use this model to select cherry data from alpaca_data.But could I use this Pre-Experienced Model to filter cherry data from other datasets (such as firefly)? In other words,If I have to use the selected pre-experienced samples from other datasets(such as firefly), and then fine-tune my pretrain model to obtained a new Pre-Experienced Model? my english is poor,I don’t know if my description is clear or not..Thanks a lot!

tianyi-lab / Cherry_LLM

Could the Pre-Experienced Model be used in other different dataset? #12