Open chenhanxin123 opened 1 year ago
In our experiment, it took a day to converge with -sr 10 (about 10 to 15 epochs). Did you replace the original coco.py and cocoeval.py? You may also visualize the results to see if the model is really converged.
I replaced the coco.py and cocoeval.py in my environment with misc/coco.py and misc/cocoeval.py according to the requirements on GitHub. Specifically, I replaced the coco.py and cocoeval.py in the/home/chenhanxin/anaconda3/envs/p37/lib/python3.7/site packages/copytools/path with misc/coco.py and misc/cocoeval.py, respectively. Because I noticed that CoCo uses 17 joint points by default, while the author uses 14 joint points. If not replaced, an error will be reported. I set SR to 10, and each epoch takes three hours. After three days of training, the AP value obtained is as follows: AP: 0.047, AP. 5: 0.090 and AP. 75:0.040. There is a tenfold difference compared to the author's results. Also, I downloaded the model from GitHub_ Best.pth evaluated on the author's default test set and obtained results such as AP: 0.649, Ap. 5:0.982, and AP. 75: 0.782.
Can you share with me the training logs and other detailed files of the author's model? My email is chenhanxin@emails.bjut.edu.cn
Hello, I had the same problem while training. The AP after training is about tenfold difference compared to the author's. Have you fixed the problem ?
I replaced the coco.py and cocoeval.py in my environment with misc/coco.py and misc/cocoeval.py according to the requirements on GitHub. Specifically, I replaced the coco.py and cocoeval.py in the/home/chenhanxin/anaconda3/envs/p37/lib/python3.7/site packages/copytools/path with misc/coco.py and misc/cocoeval.py, respectively. Because I noticed that CoCo uses 17 joint points by default, while the author uses 14 joint points. If not replaced, an error will be reported. I set SR to 10, and each epoch takes three hours. After three days of training, the AP value obtained is as follows: AP: 0.047, AP. 5: 0.090 and AP. 75:0.040. There is a tenfold difference compared to the author's results. Also, I downloaded the model from GitHub_ Best.pth evaluated on the author's default test set and obtained results such as AP: 0.649, Ap. 5:0.982, and AP. 75: 0.782.
@chenhanxin123 你好,可以分享一下数据集吗
After downloading the dataset, I preprocessed it and obtained nearly 3T of data. Is this normal? Afterwards, I followed the instructions on GitHub and retrained a model from scratch on all 276 datasets, with a default sr of 1. One epoch requires up to 30 hours of training time. Afterwards, I set the SR to 10, and one epoch also took 3 hours. The training time was very long. I trained 20 epochs, but the accuracy of the obtained model was not even 10%. I read in the paper that the author used single Tesla V100. I am using a single A6000. So I would like to know the detailed information of the author when training the model. I hope you can take the time out of your busy schedule to help me, and I would greatly appreciate it.