Can you provide me with detailed information on the training time of your model and the amount of memory occupied by the training set after preprocessing?

chenhanxin123 commented 1 year ago

After downloading the dataset, I preprocessed it and obtained nearly 3T of data. Is this normal? Afterwards, I followed the instructions on GitHub and retrained a model from scratch on all 276 datasets, with a default sr of 1. One epoch requires up to 30 hours of training time. Afterwards, I set the SR to 10, and one epoch also took 3 hours. The training time was very long. I trained 20 epochs, but the accuracy of the obtained model was not even 10%. I read in the paper that the author used single Tesla V100. I am using a single A6000. So I would like to know the detailed information of the author when training the model. I hope you can take the time out of your busy schedule to help me, and I would greatly appreciate it.

robert80203 commented 1 year ago

In our experiment, it took a day to converge with -sr 10 (about 10 to 15 epochs). Did you replace the original coco.py and cocoeval.py? You may also visualize the results to see if the model is really converged.

chenhanxin123 commented 1 year ago

I replaced the coco.py and cocoeval.py in my environment with misc/coco.py and misc/cocoeval.py according to the requirements on GitHub. Specifically, I replaced the coco.py and cocoeval.py in the/home/chenhanxin/anaconda3/envs/p37/lib/python3.7/site packages/copytools/path with misc/coco.py and misc/cocoeval.py, respectively. Because I noticed that CoCo uses 17 joint points by default, while the author uses 14 joint points. If not replaced, an error will be reported. I set SR to 10, and each epoch takes three hours. After three days of training, the AP value obtained is as follows: AP: 0.047, AP. 5: 0.090 and AP. 75:0.040. There is a tenfold difference compared to the author's results. Also, I downloaded the model from GitHub_ Best.pth evaluated on the author's default test set and obtained results such as AP: 0.649, Ap. 5:0.982, and AP. 75: 0.782.

chenhanxin123 commented 1 year ago

Can you share with me the training logs and other detailed files of the author's model? My email is chenhanxin@emails.bjut.edu.cn

XIN499 commented 9 months ago

Hello, I had the same problem while training. The AP after training is about tenfold difference compared to the author's. Have you fixed the problem ?

I replaced the coco.py and cocoeval.py in my environment with misc/coco.py and misc/cocoeval.py according to the requirements on GitHub. Specifically, I replaced the coco.py and cocoeval.py in the/home/chenhanxin/anaconda3/envs/p37/lib/python3.7/site packages/copytools/path with misc/coco.py and misc/cocoeval.py, respectively. Because I noticed that CoCo uses 17 joint points by default, while the author uses 14 joint points. If not replaced, an error will be reported. I set SR to 10, and each epoch takes three hours. After three days of training, the AP value obtained is as follows: AP: 0.047, AP. 5: 0.090 and AP. 75:0.040. There is a tenfold difference compared to the author's results. Also, I downloaded the model from GitHub_ Best.pth evaluated on the author's default test set and obtained results such as AP: 0.649, Ap. 5:0.982, and AP. 75: 0.782.

ydhgethub commented 2 weeks ago

@chenhanxin123 你好，可以分享一下数据集吗

robert80203 / HuPR-A-Benchmark-for-Human-Pose-Estimation-Using-Millimeter-Wave-Radar

Can you provide me with detailed information on the training time of your model and the amount of memory occupied by the training set after preprocessing? #11