Usage of Pile dataset to train the emulator

mit-han-lab / offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model

https://arxiv.org/abs/2302.04870

MIT License

366 stars 37 forks source link

Usage of Pile dataset to train the emulator #10

Open ziqi-zhang opened 8 months ago

ziqi-zhang commented 8 months ago

Hi,

I noticed that you trained the NLP emulator with the first 30 chunks of Pile dataset. I wonder how large are the 30 chunks? Or in other words, how many chunks does Pile have? The original Pile dataset is over 800G, it is too big for the labs...

Besides, did you try to use smaller datasets, such as Wikitext? What is the performance of using these smaller datasets?

Thanks

KKNakkav2 commented 6 months ago

Hello @ziqi-zhang,

May I ask if you were able to train on a smaller dataset for emulator distillation? If so, how was the method's performance in the case when distilled on smaller datasets? Any insights will be helpful for understanding the proposed algorithm better.

Thanks