kenziyuliu / DGNN-PyTorch

Unofficial PyTorch implementation of the CVPR'19 paper "Skeleton-Based Action Recognition with Directed Graph Neural Networks".
Other
235 stars 59 forks source link

Doubts regarding pre-processing of model #2

Closed Anirudh257 closed 5 years ago

Anirudh257 commented 5 years ago

Hey, firstly thanks a lot for implementing and making the code public. I am trying to replicate the results on NTU 120 dataset but I am unable to get good accuracies for motion stream. I re-read the paper and found out that I had missed a pre-processing step mentioned below:

The body tracker of Kinect is prone to detecting more than 2 bodies, some of which are objects. To filter the wrong bodies, we first define the energy of each bodies as the summation of the skeleton’s standard deviation across each channel. We then select two bodies in each sample according to their body energies. Subsequently, each sam- ple is normalized and translated to the central perspective, which is the same approach as that used earlier.

1) In your code for pre-processing of data, I can't find this line. So, is this required for replicating the results or we can still achieve great accuracies irrespective of this preprocessing step?

2) While testing the spatial and motion streams, is it important that we test on the model trained at the last epoch (50th ) or we can use the model checkpoint giving the best accuracy?

kenziyuliu commented 5 years ago

Hi,

Thanks for raising the issue.

  1. The quote from the paper would (most likely) refer to the "get_nonzero_std" and "read_xyz" functions in "data_gen/ntu_gen_joint_data.py". I believe this technique can be applied to both NTU datasets and the Kinetics dataset (for which I didn't implement data generation). I'm unsure if this preprocessing step makes a huge difference though.

  2. I think in general you can use the checkpoint that gives the best accuracy

Anirudh257 commented 5 years ago

Thanks @kenziyuliu for replying. I raised this issue as I was getting around 50-60% accuracy for the motion stream for both the cross-subject and cross-setup protocol. I have followed all the steps but unable to get good accuracies for the same.

I have tried the 2s-AGCN code too and it runs like a charm. This problem is not occurring there.

kenziyuliu commented 5 years ago

Hi, would you be able to produce the training curves for the motion stream? I tried to reproduce it myself too, but I was getting bad results as you did, and I haven't tried since. I suspect it is likely caused by hyperparameter choices.

Anirudh257 commented 5 years ago

These are the plots:
1) Training accuracy vs iterations img 1

2) Validation accuracy vs iterations img 2

3) Training loss vs iterations img 3

4) Validation loss vs iterations ![Uploading image.png…]()

Regarding the hyperparameters, I used the ones as given in the code

kenziyuliu commented 5 years ago

I got similar results too when reproducing it motion stream, but since not all hyperparameters/trickers were provided in the paper, I didn't investigate further to reproduce the motion stream result given hardware resources limit. I guess you might need to try tuning hyperparameters, especially regularization/learning rates, and see if you can obtain more sensible results