Question about the evaluate result

Harmon-l commented 3 years ago

Thank you for sharing your work, it's worth me to learn these codes. I follow python train_stereo.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision to train, and got the model 200000_raft-stereo.pth. I tested the trained model on Middlebury，but the result is poor. Then I downloaded the model raftstereo-sceneflow.pth you provided, but the result on Middlebury_Q is only 10.93, which cannot reach the 9.36 mentioned in the paper.

python evaluate_stereo.py --restore_ckpt models/raftstereo-sceneflow.pth --dataset middlebury_Q Validation MiddleburyQ: EPE 1.3627132922410965, D1 10.93138531781733

I would like to ask what causes this and how can I get the results mentioned in the paper. Thank you very much.

lahavlipson commented 3 years ago

The Middlebury and ETH3D validation datasets are extremely small, which leads to very high variability in error. This is why we report average results across multiple runs.

Here is a sceneflow-trained model that beats the reported accuracies on all datasets: https://drive.google.com/file/d/1ACnveiQ29vskIR-j8mCKmW8kIID8gsz2/view?usp=sharing

However the raftstereo-sceneflow.pth model weights from the README perform better overall (especially on ETH3D and Middlebury_F)

lahavlipson commented 2 years ago

I'll close this for now - you can reopen if you need to follow up.

princeton-vl / RAFT-Stereo

Question about the evaluate result #4