Train the model with NTU-RGB's 2D key points information from openpose, the result is poor.

open-mmlab / mmskeleton

A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.

Apache License 2.0

2.9k stars 1.03k forks source link

Train the model with NTU-RGB's 2D key points information from openpose, the result is poor. #82

Closed hongge831 closed 5 years ago

hongge831 commented 6 years ago

Hi there, I want to train the st-gcn model by NTU-RGB' 2D key points information which are computed by openpose. But, after more than 60 epoch's training I got poor result. Have you ever tried to use NTU-RGB's 2D skeleton points information to train your model??? if yes , Would you mind telling me your result and some training tricks? hope your reply. Thanks a lot.

hongge831 commented 6 years ago

And I have another question, I noticed that NTU-RGB's video length are short, so I wanner know how can I set the parameter T? should I still set it 300 or I should set T smaller?

hongge831 commented 6 years ago

@yysijie @yjxiong

yysijie commented 6 years ago

@hongge831 Hi, thank you for your interests in our work.

Sorry, we use full 3D data for training on NTU-RGB-D only. What about the accuracy you achieved? For more details, you can checkout our code base to V0.1.0.

On the second question, we just extend the sequence to 300 frames by padding zeros.

hongge831 commented 6 years ago

@yysijie thanks for your reply. I just got 30%+ Top5 accuracy. And I have not figure out where the problem is yet. And can you tell how much the parameter T will influence the result? If I set T equals to 300 I think the zero padding is so large for NTU's video. Have you do some experience to explor it? And another question Are the kinetics skeleton datasets you provide computed from openpose ?

yysijie commented 6 years ago

It's strange. You can follow the steps below:

Use our script to build preprocessing database: python tools/ntu_gendata.py --data_path <path to nturgb+d_skeletons > The new database will be saved under ./data/NTU-RGB-D/xview and ./data/NTU-RGB-D/xsub as .npy files.
Modify these .npy files, make one dimension of all pose tracks be zeros.
Run python main.py recognition --config config/st_gcn/ntu-xview/train.yaml

On your first question. In Kinetics-skeleton, we only observed a 0.5% accuracy improvement, when we increase T from 128 to 256. On your second question: Yes, we use openpose to extract pose tracks from videos.

hongge831 commented 6 years ago

@yysijie thanks for your reply. What does the 'Modify these .npy files, make one dimension of all pose tracks be zeros.'mean?? Can I just make NTU-RGB data's one dimension of the 3D coordinate be zero, then trian the model? The third dimension of the point's coordinate is not so important? AND another question is Would you mind telling me how do you set the openpose work parameters (e.g. --net_resolution etc.) when you apply it on the kinetics' video , I am looking forward your reply thanks a lot.

yysijie commented 6 years ago

I think the third dimension is important. But I think the top5 accuracy shouldn't be so low (30%+) even if you only use 2D information. This a simple operation can throw depth information without changing our settings. On your second question, we used the default parameters directly. But we transformed all video to 256*340 resolution and 30 fps before the pose estimation.

hongge831 commented 6 years ago

@yysijie Thank you very much. I'll try it. And I sent you(ys016@ie.cuhk.edu.hk) an email, hope you have received. :)

hongge831 commented 6 years ago

@yysijie What's the difference between the kinetics' coordinate and the NTU-RGB' 3D coordinate?? I noticed that the NTU-RGB' 3D coordinate(x and y) includes negative value?

yysijie commented 6 years ago

@hongge831, We normalized NTU-RGB-D data. Thus some values are negative. For more details about how to preprocess data, you can refer to https://github.com/yysijie/st-gcn/blob/master/tools/ntu_gendata.py and https://github.com/yysijie/st-gcn/blob/master/tools/kinetics_gendata.py

hongge831 commented 6 years ago

@yysijie Thank you very very much. I have figured out what happend. Your codebase can work well with NTU-RGB's 2D key coordinate(getting the NTU-RGB's 2D key points information by openpose). And the Top5 acc is 96.76%+ and the top1 acc is 76.01% on xsub benchmark. And the problem I met before was I modified your kinetics_gendata.py incorrectly. And I will go deep into your paper and explore the graph-CNN on action recognition then. If I come accross any problems, hope to discuss with you. Thanks again!

yysijie commented 5 years ago

@hongge831 , That's great!

s4365g commented 5 years ago

Hi, @hongge831 I was on the same work recently. Could you share your NTU-RGB's 2D skeleton data? Did you load the pre-train weights into the training process when you train the new model on st-gcn by NTU-RGB's 2D skeleton data? If yes, how to load the pre-train weight? Thanks^^

xardon17 commented 1 year ago

Hi, @hongge831 I was on the same work recently. Could you share your NTU-RGB's 2D skeleton data? Or do you still have weights for model? Thanks a lot.