Closed thesofakillers closed 2 years ago
Hi @thesofakillers, yeah the evaluation could be slow. I forgot the exact hours, but it should be within an hour if I remember correctly. I will have a test. Do you know how many examples per second can the model process in your case?
Please ignore me. Thank you for the help.
Hi - thank you very much for the paper and the repository.
I am trying to run the
eval_tk_instruct.sh script
as instructed here on a Titan RTX to reproduce ROUGE-L and Exact Match metrics. I'm running this on a SLURM-enabled server, and initially set a (admittedly optimistic) time limit of 4 hours. Turns out the evaluation was cut short by this limit and I was only 7% of the way done with evaluation, 4 hours in, with a total estimated evaluation time of around 40 hours.I then notice in the paper that you ran your experiments "with 8 A100 GPUs with 48G GPU memory per each". Is this also true (and perhaps necessary) for evaluation? Are my time estimates reported above therefore expected? Or do you think I am doing something wrong? Is GPU parallelism even enabled in the evaluation command (I see the
deepspeed
arg is not passed).Thanks!