Open JierunChen opened 1 year ago
Hi, @JierunChen. It seems that you loaded our released ckpt, and then the total/max updated steps are consistent with that in ckpt, so the code skipped the training phase directly.
@pengzhiliang Hi, you are right on that. But my question is about the program hanging at the end of training. For example, when the training finished with the output "INFO:fairseq_cli.train:done training in 82265.6 seconds", the program does not exit and continue to occupy the computing resource.
I run the training code for 1 update. The process hands and does exit after showing the message "INFO:fairseq_cli.train:done training in 117.2 seconds". Any idea on how to address the issue?