yohanshin / WHAM

MIT License
721 stars 78 forks source link

Output format and chunks #114

Closed aidilayce closed 2 months ago

aidilayce commented 2 months ago

Hello, thank you for the great work! The outputs are truly smooth and plausible. However, I have a question regarding the output format.

I saved the output file in pkl format. From this file, I get 4 resulting chunks with keys 0, 4, 17, and 31. Since they also had frame_ids, I concatenated the chunks to have the whole sequence as one variable according to the frame_ids. Then, using "pose_world" and "trans_world", I obtained joint positions by SMPL and visualised the resulting joints in world frame using a physics simulator, but the world trajectory looks like it cuts off at the end of the individual chunk. For instance, the person starts at point A and finishes at point B at the end of chunk '0', but starts the motion of chunk '4' at point C instead of B. Other than the trajectory, body pose is correct and smooth.

So, should the chunks not be concatenated according to the frame ids or should there be an additional preprocessing? Any recommendation is welcome, thanks!

hansen-yi commented 2 months ago

Hi, I apologize that this isn't a solution to your problem but I was hoping you would be able to explain to me how you exported the output and accessed the joint positions.

aidilayce commented 2 months ago

Hi, no worries.

While running the demo, I added --save_pkl option as python demo.py --video jump.mp4 --save_pkl --visualize --run_smplify. With this option, the result should be saved as wham_output.pkl in a folder named output together with other results, slam_results.pthand tracking_results.pth. Then, I exported the wham_output.pkl file with joblib. Since the output has "pose_world", "trans_world" and "betas" keys, I used an SMPL model and obtained the joints with smpl_output.joints.

hansen-yi commented 2 months ago

Ohh thank you so much!

aidilayce commented 2 months ago

So as far as I understood from other issues, these 0, 4, 17, and 31 keys represent person ids in the given video. However, I have single person in my videos even though the detector gives the output for multiple people. Has anyone come up with a workaround?

aidilayce commented 2 months ago

Solved! Turns out this is not a WHAM issue, but a YOLO issue.

For anyone who encounters the same issue: YOLO was detecting bboxes for people who were not there. So I increased the BBOX_CONF score of YOLO detector to 0.8 from 0.5 and obtained results for single person.