I am looking forward to train the skeleton based models for action recognition for custom dataset.
However i have couple of questions :
There are 2 models supported right now, PoseC3D, STGCN++.
What should be the data collection format as in :
Assuming an activity as talking on phone
Video collected for 10 seconds where talking on phone is happening at t=4 to t=6
Should the final video for whom pose has to be extracted be only clip of t=4 to t=6?
Should there be only 1 person in the frame : If there are 2 people in the video, one is talking on phone, other is not doing anything, is it a valid scenario or all the data collected should be 1 person in the frame.
Does the data collected need to be in NTU format or Kinetics format or any other format?
Once a final skeleton dataset is prepared, will it work for both the models i.e. PoseC3D and STGCN++
Hi All,
I am looking forward to train the skeleton based models for action recognition for custom dataset. However i have couple of questions :