naver / dust3r

DUSt3R: Geometric 3D Vision Made Easy
https://dust3r.europe.naverlabs.com/
Other
5.05k stars 549 forks source link

Table 1 discrepencies #171

Open kmatzen opened 3 weeks ago

kmatzen commented 3 weeks ago

Could I get some help understanding how table 1 in the paper was computed? I tried to reproduce the results using the given DUSt3R_ViTLarge_BaseDecoder_512_dpt model and a model that I newly trained using the provided code. I started by comparing the two models with visloc.py on the 7-scenes dataset, but the numbers for the provided model don't seem to match what is reported in the paper.

Subscene Variant Images Median Pos Error Median Angular Error acc@0.1m,1deg acc@0.25m,2deg acc@0.5m,5deg acc@5m,10deg
chess Given 2000 0.08876 2.79708 2.85 29.05 91.00 98.45
chess Mine 2000 0.08291 2.92299 3.95 27.90 88.95 98.45
fire Given 2000 0.05118 1.92276 21.25 51.65 90.50 98.10
fire Mine 2000 0.05716 2.22613 18.45 45.20 91.00 99.25
heads Given 1000 0.02548 1.55556 27.90 64.70 90.20 92.20
heads Mine 1000 0.03809 2.14186 13.50 45.50 80.60 87.00
office Given 4000 0.18812 4.79558 3.85 17.58 51.63 63.53
office Mine 4000 0.21497 5.31324 4.43 15.80 48.08 62.98
pumpkin Given 2000 0.15922 3.88746 2.75 24.20 62.55 79.80
pumpkin Mine 2000 0.16227 4.01992 4.20 24.60 57.00 77.70
redkitchen Given 5000 0.12894 3.88511 3.84 21.10 60.48 80.38
redkitchen Mine 5000 0.12973 3.53684 4.10 21.20 62.40 83.18
stairs Given 1000 inf inf 1.50 7.30 17.10 18.90
stairs Mine 1000 inf inf 0.50 2.60 8.00 10.90

The "given" results were computed with this command as an example.

python3 visloc.py \
  --dataset "VislocSevenScenes('/mnt/localssd/7scenes/', subscene='redkitchen', pairsfile='APGeM-LM18_top20', topk=1)" \
  --pnp_mode poselib \
  --reprojection_error_diag_ratio 0.008 \
  --output_dir /mnt/localssd/7scenes/redkitchen/loc-given_poselib_stage3 \
  --model_name DUSt3R_ViTLarge_BaseDecoder_512_dpt
Screenshot 2024-08-30 at 11 58 21 AM
yocabon commented 1 day ago

Hi, I do not get the same results as you, so I am not sure what's going on.

python3 visloc.py --model_name DUSt3R_ViTLarge_BaseDecoder_512_dpt --dataset "VislocSevenScenes('/path/to/7-scenes/', subscene='chess', pairsfile='APGeM-LM18_top20', topk=1)" --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/dust3r_7scenes/20_09_24/chess/loc 

gives me

VislocSevenScenes('/path/to/7-scenes/', subscene='chess', pairsfile='APGeM-LM18_top20', topk=1): 2000 images - median_pos_error=0.027831130071196423, median_angular_error=0.9597363623122772  - acc@0.1m,1deg=53.050  - acc@0.25m,2deg=89.400  - acc@0.5m,5deg=97.700  - acc@5m,10deg=97.750
kmatzen commented 1 day ago

Would it be possible for you to share some preprocessed data as per https://github.com/naver/dust3r/tree/main/dust3r_visloc#7scenes? Then I could see if there's a problem with how I followed the preprocessing instructions or if there's a problem with how the model is used.