Data preparation - Githubissues

faithbotbbot commented 2 weeks ago

Hello, thank you for your contribution. I would like to ask you three questions.

When I use the official script to convert videos to RGB images with an FPS of 30, the speed is too slow. Is this normal? (It takes about two days to process all the videos.) And I would like to ask how much storage space the processed data occupies.
Can you list the directory structure of the training data required by Vistracker in detail?
The article does not mention the training time issue. How long does it take to train Vistracker using A100 80G?

xiexh20 commented 5 days ago

Hi, thank you for your interest.

Yes, this is normal. There are in total ~1.2k videos to be processed, so it can take long time. I usually use multiple CPUs in parallel for this kind of jobs.
I basically follows the same structure as in the original BEHAVE dataset. Specifically for training, these data are:
- RGB and masks: same as in behave format.
- The UDF sampling are done online, for that you will need the GT SMPL and object mesh, see the path in dataloader here: https://github.com/xiexh20/VisTracker/blob/main/data/traindata_online.py#L93. You can obtain this by parsing the human and object pose parameters in the npz files using the tools here: https://github.com/xiexh20/behave-dataset/tree/main/tools
- pre-rendered triplane images: see dataloader https://github.com/xiexh20/VisTracker/blob/main/data/traindata_online.py#L87. For this you can use this script to render the GT SMPL meshes.
- pre-computed object visibility file: they are packed in this file, see also the documentation
  1. For training, I used 4 GTX8000 to train. It took around 35h to converge. If you have one A100 80G it is similar to 2GTX8000 so I guess it would take ~70hours.

faithbotbbot commented 3 days ago

Thank you so much for your detailed and helpful response! Your guidance is invaluable for my project. Best regards!

xiexh20 / VisTracker