Open RohaanA opened 9 months ago
Hi @RohaanA ,
Thank you for your continuous interest in WHAM, and I am glad that it worked out finally :).
The assignment of the subject to be rendered the global motion is done through this line. The current method simply selects the subject who has been shown in the video for the longest time. Maybe you can simply write a code to assign the subject you want to visualize. But as your video has multiple shots, rather than a single-camera video, I would recommend splitting the video into each shot (i.e., continuous video taken by a main camera).
Yes, it takes quite a long time to run using Google Colab (and free GPU). I tested the model with various GPUs including (A100, RTX 3090, ...) those GPUs allow reasonably fast inference, this can be even faster if we run in a batch mode. I currently don't have such an implementation to support multi-gpu environment. Maybe try to install it on your local machine if you have Nvidia GPU with > 12 GB memory.
I will modify the demo code to save the output in the same format as VIBE. This will allow you to run the same script, I think this will work. Try a shot later this week!
If you have your yolo that works only for the players, I believe that should work. You can try modifying this line with your fine-tuned YOLO weight. Please let me know how it works!
Thanks :D! I'm currently out on vacations and will be back next week to try this out. If you want you can keep this issue open until then, or I can make a new one when I report my findings.
Hey @yohanshin , I finally got back home and got to try out the --save_pkl flag. Firstly, I noticed that using this flag the colab environment is not able to fully complete the demo script.
I am not exactly sure what causes this to happen, but after completing the 2D Detection and Feature Extraction the rendering part of the demo pipeline doesn't run.
However a pkl file was still created by WHAM, I tried using it using the fbx script but I got this error.
I haven't gone through the VIBE error, but I think It might due to the abrupt ending of the demo script.
Hi @RohaanA
I just updated the visualization code, there was a variable name mismatch which happened after I fixed the output file format. Please try again for the rendering.
I am not sure about the blender. I will check when I have chance to go over that part.
Hi @RohaanA
I just updated the visualization code, there was a variable name mismatch which happened after I fixed the output file format. Please try again for the rendering.
I am not sure about the blender. I will check when I have chance to go over that part.
Thank you for the quick response! I think some problem occurred in saving/loading of the pickle file.
Hey @yohanshin , I am terribly sorry the pickle load was an error on my end. I had a much older version of joblib (0.14.0), while WHAM uses 1.3.2. Updating the joblib version fixed the issue!
I was able to get the poses into blender.
Hey @yohanshin , I am terribly sorry the pickle load was an error on my end. I had a much older version of joblib (0.14.0), while WHAM uses 1.3.2. Updating the joblib version fixed the issue!
I was able to get the poses into blender.
Hi @RohaanA can you please share your code to import the output to blender? Thanks in advance
Hey @yohanshin , I am terribly sorry the pickle load was an error on my end. I had a much older version of joblib (0.14.0), while WHAM uses 1.3.2. Updating the joblib version fixed the issue! I was able to get the poses into blender.
Hi @RohaanA can you please share your code to import the output to blender? Thanks in advance
Sure, I am using VIBE's script to convert the output. My env is on windows using a miniconda setup, here's the script. https://github.com/mkocabas/VIBE?tab=readme-ov-file#fbx-and-gltf-output-new-feature
For setting up the script you can follow the installation tutorials under that link.
@yohanshin First, I would like to express my appreciation for your fantastic work. Like @RohaanA, I used the VIB plugin and turned the PKL output of the Wham algorithm into FBX. I rendered the FBX file in Blender and Unity. The problem is that the character is always centred in all the frames. All the poses and movements of the character are happening at the same point. I have the same problem with the VIB model. Do you have any suggestions to fix this?
@yohanshin First, I would like to express my appreciation for your fantastic work. Like @RohaanA, I used the VIB plugin and turned the PKL output of the Wham algorithm into FBX. I rendered the FBX file in Blender and Unity. The problem is that the character is always centred in all the frames. All the poses and movements of the character are happening at the same point. I have the same problem with the VIB model. Do you have any suggestions to fix this?
This isn't actually a problem but intended behaviour since vibe does not convert positional information unlike wham. I believe you would need to add the 3d world coordinates at a later step since the script is not designed for it.
Hi @RohaanA ,
Thank you for your continuous interest in WHAM, and I am glad that it worked out finally :).
- The assignment of the subject to be rendered the global motion is done through this line. The current method simply selects the subject who has been shown in the video for the longest time. Maybe you can simply write a code to assign the subject you want to visualize. But as your video has multiple shots, rather than a single-camera video, I would recommend splitting the video into each shot (i.e., continuous video taken by a main camera).
- Yes, it takes quite a long time to run using Google Colab (and free GPU). I tested the model with various GPUs including (A100, RTX 3090, ...) those GPUs allow reasonably fast inference, this can be even faster if we run in a batch mode. I currently don't have such an implementation to support multi-gpu environment. Maybe try to install it on your local machine if you have Nvidia GPU with > 12 GB memory.
- I will modify the demo code to save the output in the same format as VIBE. This will allow you to run the same script, I think this will work. Try a shot later this week!
- If you have your yolo that works only for the players, I believe that should work. You can try modifying this line with your fine-tuned YOLO weight. Please let me know how it works!
Hey @yohanshin,
I finally got around to testing out that last part. It does indeed work! As you can see in the video below, now it is only detecting the 2 players and not the other people in the video. It missed the farside player at the end of the video which means our custom model still needs some improvements haha. Also I'd like to share that using our own model(which was trained on yolov8s) instead of the default yolov8x brought down the end-to-end rendering time from 24 minutes to 12 minutes!
https://github.com/yohanshin/WHAM/assets/75722072/8dcd04fb-20d9-490c-b618-1c9437838a8d
I think this resolves all the queries I had for this Issue thread. Also a bonus for anyone with their own yolov8 models, you can enable tracking by replacing model.predict with model.track in detector.py (lib/models/preproc)!
Hello @RohaanA , I think ti might be interesting to you. I was able to make a script based on the VIBE one that can get the motion (even the translation from the PKL file generated on google colab)
you can get instructions at: https://github.com/yohanshin/WHAM/issues/52
I recorded a video showing the process on google colab (that you already know) and how to use the script for blender https://www.youtube.com/watch?v=7heJSFGzxAI
and if you have windows and wnat to install locally, I put the notes that I wrote to myself here https://github.com/yohanshin/WHAM/issues/53
Hope it helps.
Hello @RohaanA , I think ti might be interesting to you. I was able to make a script based on the VIBE one that can get the motion (even the translation from the PKL file generated on google colab)
you can get instructions at: https://github.com/yohanshin/WHAM/issues/52
I recorded a video showing the process on google colab (that you already know) and how to use the script for blender https://www.youtube.com/watch?v=7heJSFGzxAI
and if you have windows and wnat to install locally, I put the notes that I wrote to myself here https://github.com/yohanshin/WHAM/issues/53
Hope it helps.
Hey, That is really interesting! I'll surely try this out and put my results here! :)
Hey everyone, I was experimenting with WHAM and tried converting the output to fbx. However, something seems wrong with the conversion, where the hand position does not match the output rendered from WHAM. Do you have any idea how I could fix this? I am also using the VIBE script.
https://github.com/yohanshin/WHAM/assets/95719434/cfda6dc1-b6c4-47d6-be24-d6f7f5060c37
https://github.com/yohanshin/WHAM/assets/95719434/c53b2622-995b-4a69-a39c-384403590b78
@yohanshin thank you for this great repo. Would you happen to have any suggestions on improving the accuracy for this specific task of predicting the 3D pose of a golf swing? (Finetuning on domain specific data?) As you can see in the videos the model struggles a little with the feet when there is a large twist in the body. For reference, here is the input video
https://github.com/yohanshin/WHAM/assets/95719434/06f7c227-27b3-443d-9983-9dab296d3e4e
I sincerely thank @carlosedubarreto and @RohaanA for your sharing various WHAM demo output and guidelines for the blender users.
Hi @jlnk03 ,
One way to improve the output quality is to run smplify
as post-processing. I have made few changes in this new branch, where you can run demo.py
with parsing --run_smplify
, it will run temporal SMPLify similar to VIBE script.
This is the exemplary result when I ran the new demo on your video:
https://github.com/yohanshin/WHAM/assets/46889727/90516e5f-f33a-4626-9fe0-0296624d83e8
This new demo script improves the pixel alignment (and better 3D accuracy when I evaluated it on benchmarks), but still foot is not perfect. What I can suggest is that get 2D keypoints of foot joints (such as toes and heels) and add additional reprojection loss function using the foot joints. It will resolve the issue. Let me know how this goes.
Hi @yohanshin,
Running smplify
definitely helps to improve the output, thank you!
I added additional loss terms for feet and hands to the forward pass of the SMPLifyLoss
class (see picture), which basically just gives more weight to these specific joints. Was this what you meant? Unfortunately, this does not improve the output for me. As a reference, I plotted the 2D keypoints on the video.
Regarding the reprojection loss for the feet, how would you implement this for toes and heels since the 2D detections follow the COCO convention and only contain the ankles?
https://github.com/yohanshin/WHAM/assets/95719434/cd20e3d4-cab5-427c-b231-3aa39810d1cd
I am also still struggling to get the blender output to match the prediction. Below is the blender output for the first frame of the first demo video on colab (IMG_9732.mov). As you can see the hand position but also the knee angle are not correct. I am not quite sure if this is the way WHAM outputs the joints or if it is a mistake on my side. Maybe @RohaanA has an idea?
@yohanshin @carlosedubarreto @RohaanA Hi, Thanks again for your fantastic job and contributions. I tried some videos that were captured from 360 cameras (insta360 x2) and the results were better than I expected. However, the issue is that when using equrectangular video Wham failed to detect the correct position of the person in the real world. In other words, the 'trans' parameter calculated by Wham is wrong for the 360 videos. Do you have any idea how I can calculate this parameter correctly for 360 cameras?
hello @MehranRastegarSani , I dont have experience with that part, but I remember there is a way to change the camera config, I think. (its my guess)
Maybe that is something that could help to get better result. the part that tells about calibration is on the main page, here is a screenshot
Hi,
Finally the colab notebook has been released, and I got to try this model on tennis footage. I was really impressed by the model's performance on tennis videos, and this model has been the most accurate model I have seen (I have been looking for 3D models that can perform well in tennis scenarios for the past few months)!
Kudos to the team for making such an excellent model! :) Here's a video of the performance of the model, albeit it took me about 2 hours on colab to produce this!
https://github.com/yohanshin/WHAM/assets/75722072/bcdcd8f0-7d74-4e4c-b1ef-67d918dd6de7
I have a few questions regarding this output from the team, which I will number down below.
Once again, I was really impressed by the model performance, I am sure this model will set a new benchmark for other HMR models in the future :)