Difference in reported accuracy

fmthoker commented 6 years ago

Hi, I ran you experiemnt in following setting

HCN[1] | 32 | 256 -- | -- | -- However, my accuracy is only 83.8 for cross subject evaluation.

Also for

HCN[1]	32	512

my accuracy is only 84.5 for cross subject evaluation.

EVO0LEE commented 6 years ago

I also encountered the exact same situation.

huguyuehuhu commented 6 years ago

Sorry for the late reply. I was just finished my holiday of the National Day. How about the Cross-View evaluation, is it the same with the accuracy in my experiments? @fmthoker @EVO0LEE

EVO0LEE commented 6 years ago

I got a same accuracy with @fmthoker 83.8(CV) 84.5(CS)，but the accuracy on train set is only 77.4(CV). Lower than that on test set!

fmthoker commented 6 years ago

I encountered the same issues where my train accuracy is lower than my test accuracy.

huguyuehuhu commented 6 years ago

Where did you get the accuracy? In the curve produced by Visdom, or by running the command python main.py --mode test .....?. I cannot reproduce your problem when the training is over, my train accuracies always higher than the test error, both on CV & CS.
How many GPU did you use? do you use data parallel , I did not use it, I remember it will lower than single GPU, so I didn't use it.

EVO0LEE commented 6 years ago

Thank you for reply. I got the accuracy by python main.py --mode train ..... (params 01), it show accuracy after the evaluation in every epoch. And I used 2 GPUs surely with data parallel.

huguyuehuhu commented 6 years ago

Hi, @EVO0LEE @fmthoker I have downloaded the repo and run it again, the results are as follows:

Sorry that, all the batch_size in HCN02 of CS&CV should be changed from 32 into 64; But this did not make a big difference.
Model Normalized
Sequence
Length FC
Neuron
Numbers NTU RGB+D
Cross Subject (%) NTU RGB+D
Cross View (%)

HCN[1] 32 256 86.5 91.1

HCN 32 256 84.5 89.5

HCN 64 512 84.9* 90.7*

Model	Normalized Sequence Length	FC Neuron Numbers	NTU RGB+D Cross Subject (%)	NTU RGB+D Cross View (%)
HCN[1]	32	256	86.5	91.1
HCN	32	256	84.5	89.5
HCN	64	512	84.9*	90.7*

So, it indeed has small variations because of the Random function in my script. I suggest that you can run several trials and average their results. I have updated the results in the readme, also added new training curves for CS.

Tip1: you should get the best not thefinal results from curves produced by Visdom, or directly read it in best_accuracy_series.xlsx of each trial. Also, you can run python main.py --mode test --load True ...... to get it. The best model was automatically saved in ./checkpoints/best.pth.tar for each trial.
Tip2: Try not use data parallel, it will do harm to the accuracy, I have not found out why.

yjxiong commented 6 years ago

I came across this repo. It is a nice work.

Just a small reminder. Reporting results using the model with the highest validation accuracy is only valid when your validation set and test set are different. Otherwise, it will be equivalent to tuning parameters (number of iterations in this case) on the test set. To my understanding in your NTU-RGBD experiments, the validation set is not a different set than the one used for the testing.

I don't feel this little caveat undermines the value of this work. But it would be better if we can avoid it in published papers.

huguyuehuhu commented 6 years ago

@yjxiong Thanks for your reminding! We also notice some works, [e.g] this work (http://cn.arxiv.org/pdf/1806.00451.pdf), argue about the problem that selecting model and tuning parameters on test set are harmful for generalization, and this paper also states that it is a relative common problem in the community (see quotation in the following), also in many popular repos, e.g. repo1, etc.

we typically have limited access to new data from the same distribution. It is now commonly accepted to re-use the same test set multiple times throughout the algorithm and model design process. (refering the Introduction in the work )

We admit that it is indeed an urgent problem for the community, it is time to correct it. Thank you very much for the reminding. We will not use it in papers.

huguyuehuhu / HCN-pytorch

Difference in reported accuracy #2