tianyi-lab / Cherry_LLM

[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other models
287 stars 19 forks source link

Any report of time consuming? #13

Closed redreamality closed 10 months ago

redreamality commented 10 months ago

I've already changed the tensors into gpu. And i'm pretty sure that I'm using GPU. However, running cherry_seletion/data_analysis.py takes about nearly a month on about a 50k dataset. Is it normal?

MingLiiii commented 10 months ago

Thanks for your interest. It's not normal. What GPU do you use? On 1 A6000, it will take only a couple of hours, since it just computes losses and no gradients are required.