Closed fmthoker closed 6 years ago
I also encountered the exact same situation.
Sorry for the late reply. I was just finished my holiday of the National Day. How about the Cross-View evaluation, is it the same with the accuracy in my experiments? @fmthoker @EVO0LEE
I got a same accuracy with @fmthoker 83.8(CV) 84.5(CS),but the accuracy on train set is only 77.4(CV). Lower than that on test set!
I encountered the same issues where my train accuracy is lower than my test accuracy.
python main.py --mode test .....
?. I cannot reproduce your problem when the training is over, my train accuracies always higher than the test error, both on CV & CS.Thank you for reply. I got the accuracy by python main.py --mode train .....
(params 01), it show accuracy after the evaluation in every epoch.
And I used 2 GPUs surely with data parallel.
Hi, @EVO0LEE @fmthoker I have downloaded the repo and run it again, the results are as follows:
batch_size
in HCN02 of CS&CV should be changed from 32 into 64; But this did not make a big difference.Model | Normalized Sequence Length |
FC Neuron Numbers |
NTU RGB+D Cross Subject (%) |
NTU RGB+D Cross View (%) |
---|---|---|---|---|
HCN[1] | 32 | 256 | 86.5 | 91.1 |
HCN | 32 | 256 | 84.5 | 89.5 |
HCN | 64 | 512 | 84.9* | 90.7* |
So, it indeed has small variations because of the Random function in my script. I suggest that you can run several trials and average their results. I have updated the results in the readme, also added new training curves for CS.
best
not thefinal
results from curves produced by Visdom, or directly read it in best_accuracy_series.xlsx
of each trial. Also, you can run python main.py --mode test --load True ......
to get it. The best model was automatically saved in ./checkpoints/best.pth.tar
for each trial.data parallel
, it will do harm to the accuracy, I have not found out why.I came across this repo. It is a nice work.
Just a small reminder. Reporting results using the model with the highest validation accuracy is only valid when your validation set and test set are different. Otherwise, it will be equivalent to tuning parameters (number of iterations in this case) on the test set. To my understanding in your NTU-RGBD experiments, the validation set is not a different set than the one used for the testing.
I don't feel this little caveat undermines the value of this work. But it would be better if we can avoid it in published papers.
@yjxiong Thanks for your reminding! We also notice some works, [e.g] this work (http://cn.arxiv.org/pdf/1806.00451.pdf), argue about the problem that selecting model and tuning parameters on test set are harmful for generalization, and this paper also states that it is a relative common problem in the community (see quotation in the following), also in many popular repos, e.g. repo1, etc.
we typically have limited access to new data from the same distribution. It is now commonly accepted to re-use the same test set multiple times throughout the algorithm and model design process. (refering the Introduction in the work )
We admit that it is indeed an urgent problem for the community, it is time to correct it. Thank you very much for the reminding. We will not use it in papers.
Hi, I ran you experiemnt in following setting
HCN[1] | 32 | 256 -- | -- | -- However, my accuracy is only 83.8 for cross subject evaluation.
Also for
my accuracy is only 84.5 for cross subject evaluation.