Following your paper the 3d reconstruction network takes in an action sequence of 2D joint keypoint locations of window_size = 243 and produces only one 3d skeleton according to the architecture diagram on page 3 of the paper? If this is the case how do you end up producing the whole action sequence in 3D?
First, very nice work!
Following your paper the 3d reconstruction network takes in an action sequence of 2D joint keypoint locations of window_size = 243 and produces only one 3d skeleton according to the architecture diagram on page 3 of the paper? If this is the case how do you end up producing the whole action sequence in 3D?
Also what I found confusing was according to this issue https://github.com/tobiascz/VideoPose3D/issues/4 you state that the input to the network is actually the video file?
Just wondering if you could clarify these two points for me? Thanks!