Questions about SceneTracker implementation and compatibility

W-Nabe commented 2 months ago

I'm interested in SceneTracker and have several questions about its usage:

Is it possible to run SceneTracker on my own mp4 files? If so, how should I specify the mp4 file for processing? (If there are any additional requirements besides the mp4 file, I would appreciate knowing about them.)
I don't have an expensive GPU, but I do have an RTX 2070s with 8GB of VRAM. Is it possible to run SceneTracker on this GPU (perhaps by reducing memory usage)? Alternatively, is it possible to run it on Google Colab?
My ultimate goal is to import tracking points into Blender for use in animating objects within Blender, similar to SpaTracker (https://github.com/henry123-boy/SpaTracker). Is this possible with SceneTracker?

I apologize if any of my questions are off-base, as I'm not very familiar with this field. I would greatly appreciate your responses when you have the time.

wwsource commented 2 months ago

Thank you for noticing our work! We are sorry for the late reply.

The input of SceneTracker is RGBD video. If you want to run on mp4 files, you need to first use a monocular depth estimation algorithm to predict the depth of each frame. This operation is the same as SpaTracker.
I can understand your situation. You can reduce video memory consumption by decreasing the number of synchronized tracking points.
This is completely feasible. As long as we have the input of RBGD video, our output format is the same as SpaTracker.

W-Nabe commented 2 months ago

Thank you for your response!

Regarding point 1, you mentioned that mp4 files can't be used directly and that "you need to first use a monocular depth estimation algorithm to predict the depth of each frame." Could you suggest specific tools that would be good for generating this?

For point 2, you mentioned "decreasing the number of synchronized tracking points." Could you provide more details on the specific steps to take and what code to write to achieve this?

As for point 3, I haven't been able to run SpaTracker on my GPU or Colab yet, so I haven't even generated anything. I'm not sure about the conversion process, so if I encounter any issues, I'll ask again.

I apologize, but my knowledge in this area is really limited, and for points 1 and 2, I can't envision the specific code or tools to use. Any advice you could provide would be greatly appreciated.

wwsource commented 2 months ago

You can refer to ZoeDepth.
You can modify --track_point_num 256 in script/train_odyssey.sh to --track_point_num x, where x is a value that does not exceed the video memory.

W-Nabe commented 2 months ago

Thank you for your reply!

I haven't been able to try it yet due to time constraints, but I'm planning to run it on WSL2 in Windows 11.

I apologize for taking up your time, but I'd like to confirm the overall execution procedure as it's still a bit unclear to me.

I fed the repository contents, questions, answers, and materials into Google AI Studio and asked for the execution procedure. The AI output the following procedure. Is it correct?

Step 1: Generate depth maps

Install ZoeDepth and confirm how to run it.
Run ZoeDepth on the prepared mp4 video and generate depth maps for each frame.
Save the depth maps in a format compatible with SceneTracker's input format (e.g., .npz format).

Step 2: Run SceneTracker

If necessary, modify --track_point_num 256 in script/train_odyssey.sh to --track_point_num x.

Create a new folder in the data/demo directory and place the video and depth maps there.
```
data/demo/my_video/
├── rgb.mp4
└── deps.npz
```

Edit data/dataset.py and change the data_root variable in the WWOdyssey class to include data/demo/my_video.

data_root = 'data/demo/my_video'  # Modified part
odyssey_root = data_root + 'LSFOdyssey'
driving_root = data_root + 'LSFDriving'

Edit run_demo.py and change the call to the validate_odyssey function as follows:

run_test.validate_odyssey(model, split='demo', seq_name='my_video')  # Modified part

Edit run_test.py and add the following code at the beginning of the validate_odyssey function:

if seq_name:  # Added part
    val_set = WWOdyssey(seq_len=-1, track_point_num=-1, split=split, seq_name=seq_name)
else:
    val_set = WWOdyssey(seq_len=-1, track_point_num=-1, split=split)

Run run_demo.py.
```
python run_demo.py
```
When the execution is complete, a track.npz file will be generated in the data/demo/my_video folder. This file contains the 3D coordinates of the tracked points.

Also, how can I rewrite the code to specify the points to be tracked?

wwsource / SceneTracker

Questions about SceneTracker implementation and compatibility #10

Step 1: Generate depth maps

Step 2: Run SceneTracker