Open Xinxinatg opened 1 year ago
Hi, Thanks for your comment. The models are trained on a dataset in which a maximum of 2 persons are in the scene performing an action. If there are more than 2 persons, it selects 2 of them closer to the camera. It could possibly be used for the crowded scene if we train the model on such a dataset.
I want to know how the algorithm classifies the behavior of each human skeleton separately in a multi person action recognition scenario?
My question is whether to execute the algorithm N times separately for N human skeleton sequences in a video sequence, or to execute N human skeleton sequences together once to produce N classification results? What if there is interaction between two people?
Thanks for your work, this repo might be one of the few work that could achieve real-time inference skeleton based action recognition. I saw in the demo that only one person is in the scene, I am wondering whether you have tried using the model in the crowded scene.