Couple of Questions on Input and Output

First, very nice work!

Following your paper the 3d reconstruction network takes in an action sequence of 2D joint keypoint locations of window_size = 243 and produces only one 3d skeleton according to the architecture diagram on page 3 of the paper? If this is the case how do you end up producing the whole action sequence in 3D?

Also what I found confusing was according to this issue https://github.com/tobiascz/VideoPose3D/issues/4 you state that the input to the network is actually the video file?

Just wondering if you could clarify these two points for me? Thanks!

tobiascz / VideoPose3D

Couple of Questions on Input and Output #25