Evaluation time estimate?

thesofakillers commented 2 years ago

Hi - thank you very much for the paper and the repository.

I am trying to run the eval_tk_instruct.sh script as instructed here on a Titan RTX to reproduce ROUGE-L and Exact Match metrics. I'm running this on a SLURM-enabled server, and initially set a (admittedly optimistic) time limit of 4 hours. Turns out the evaluation was cut short by this limit and I was only 7% of the way done with evaluation, 4 hours in, with a total estimated evaluation time of around 40 hours.

I then notice in the paper that you ran your experiments "with 8 A100 GPUs with 48G GPU memory per each". Is this also true (and perhaps necessary) for evaluation? Are my time estimates reported above therefore expected? Or do you think I am doing something wrong? Is GPU parallelism even enabled in the evaluation command (I see the deepspeed arg is not passed).

Thanks!

yizhongw commented 2 years ago

Hi @thesofakillers, yeah the evaluation could be slow. I forgot the exact hours, but it should be within an hour if I remember correctly. I will have a test. Do you know how many examples per second can the model process in your case?

thesofakillers commented 2 years ago

Please ignore me. Thank you for the help.

yizhongw / Tk-Instruct

Evaluation time estimate? #9