Training and evaluating model on TAP-Vid DAVIS produces different results

AssafSinger94 commented 8 months ago

Hello, I am trying to reproduce OmniMotion results on TAP-Vid DAVIS. I preprocessed and trained the models using the default configs (except for using num_iters=200_000). However, when evaluating the trained models I am getting d_avg=63.5%, which is lower compared to 67.5% outlined in the paper. (Further elaboration of my training & evaluation process is described below).

Therefore I wanted to ask, do the default hyperparameters and configurations in the repo match reported model? Also, I wanted to ask whether you have any code for evaluating OmniMotion on Tap-Vid? I had to write some code on my own (which I verified and fairly trust), but still I think that using your evaluation pipeline would still be reliable. :)

Thank you in advance! Assaf

AssafSinger94 commented 8 months ago

For each TAP-Vid DAVIS video I apply the following:

Place the frames (which are in 256x256 resolution) under color dir, as described in the preprocessing instructions.
Run python main_processing.py --data_dir ../tapvid_davis/processed_256/$i/ --chain (after completing all necessary preprocessing instructions).
python train.py --config configs/default.txt --data_dir ./tapvid_davis/processed_256/$i/ --save_dir ./tapvid_davis/processed_256/$i/ --num_iters 200000
Extract predictions for query points & compute metrics.

AssafSinger94 commented 8 months ago

Thinking of this situation, I wanted to ask , would it be possible for you to provide the pre-trained weights for TAP-Vid DAVIS? I think that would be the optimal solution, and much simpler than retraining the model and making sure everything works perfectly. It would be deeply appreciated.

64327069 commented 8 months ago

I would like to ask how to evaluate the result, there is no eval code in the repo, neither the guidance of evaluation. Would you like to provide the eval code? I am reproducing the code now.

64327069 commented 8 months ago

I mean eval the metric of OA, AJ, etc

qianqianwang68 commented 8 months ago

Thank you for your questions.

This folder contains a script for evaluation (eval_tapvid_davis.py) and the pre-trained weights which you can use to reproduce the exact result in the paper.

To run the evaluation:

first download the py and zip file, unzip the zip file, and put both in the project directory.
The paper's results were generated by an old model architecture which is slightly different from the released one, so please do the following modifications: change the hidden_size here from [256, 256, 256] to [256, 256]. And then change this line to nn.Linear(input_dims + input_dims * ll * 2, proj_dims), nn.ReLU(), nn.Linear(proj_dims, proj_dims).
run python eval_tapvid_davis.py. If the evaluation runs successfully, you should get this output which matches the number in the paper: 30 | average_jaccard: 0.51746 | average_pts_within_thresh: 0.67490 | occlusion_acc: 0.85346 | temporal_coherence: 0.74060

Regarding the hyperparameters: yes we used a different set of hyperparameters for the tap-vid evaluation (but they were the same across all tap-vid videos). The reason is that tap-vid videos have much lower resolutions (256x256), and we found RAFT performance downgrades and relying more on the photometric information by upweighing its loss helps improve the performance. I hope this is helpful for you at least for now. Please allow me some time to integrate and organize things into the codebase and release more details.

Mixanik-43 commented 6 months ago

Hello! Given this issue and #42 , here are some changes that should be applied to the default config to reproduce training+evaluation on TAP-Vid dataset from the omnimotion paper:

set args.min_depth = -0.5 and args.max_depth = 0.5
set args.use_affine = False
set args.num_iters = 200000
change the hidden_size here from [256, 256, 256] to [256, 256].
change this line to nn.Linear(input_dims + input_dims * ll * 2, proj_dims), nn.ReLU(), nn.Linear(proj_dims, proj_dims).

Is it correct? Are there any other changes required for the quantitative results reproduction?

Relying more on the photometric information by upweighing its loss helps improve the performance.

So, in your TAP-Vid training photometric loss weight was increasing from 0 to 10 over the first 50k steps and then staying fixed at 10, or was some other schedule applied?

nargenziano commented 5 months ago

Would you mind sharing the full config file used for the results in the paper?

Guo-1212 commented 4 months ago

Hello！ In the annotations folder, I can see that each video sequence corresponds to a pkl file, I would like to ask, how did this file get it?There is no such file in the training results, and I did not find the module that generated this file in the code.

Guo-1212 commented 4 months ago

The paper mentions these two methods：屏幕截图 2024-02-22 181011 But what kind of changes can be made to the published code to get another method, or can you publish the code for the other method?

YiJian666 commented 2 months ago

你好！在annotations文件夹中可以看到每个视频序列都对应一个pkl文件，我想问一下，这个文件是怎么得到的？训练结果中没有这个文件，我也没有找到该模块在代码中生成此文件。

你好，请问你找到了吗？方便说一下这个对应的pkl文件是怎么得到的吗？

qianqianwang68 / omnimotion

Training and evaluating model on TAP-Vid DAVIS produces different results #37