Good morning everyone,
I want to point out that the 'ntu_pose_extraction,py' is not working as expected, morever for me it just works on cpu device.
Moreover, I would like to understand, how starting from raw videos we can obtain spatio temporal annotated videos to put as input features the skeleton actions to feed both 2d till 4d stream (as wrote in the paper).
My question is, how would we be able to convert these video into feedable annotations? Creating pickles seems a bit weird, since it is reported that each one should be merged together to train our final model, but with which criteria?
An example on how to create a small dataset and train it wouldnt be bad at all, if you want I might help you building that for further references.
The doc issue
Good morning everyone, I want to point out that the 'ntu_pose_extraction,py' is not working as expected, morever for me it just works on cpu device.
Moreover, I would like to understand, how starting from raw videos we can obtain spatio temporal annotated videos to put as input features the skeleton actions to feed both 2d till 4d stream (as wrote in the paper).
My question is, how would we be able to convert these video into feedable annotations? Creating pickles seems a bit weird, since it is reported that each one should be merged together to train our final model, but with which criteria?
An example on how to create a small dataset and train it wouldnt be bad at all, if you want I might help you building that for further references.
Thanks a lot.
Gianluca
Suggest a potential alternative/fix
No response