naver / mast3r

Grounding Image Matching in 3D with MASt3R
Other
1.35k stars 101 forks source link

Unable to reproduce results on Aachen 'day' #82

Open bakuljangley opened 3 weeks ago

bakuljangley commented 3 weeks ago

The command I ran and its output is:

(mast3r) (base) bjangley : ~/VPR/mast3r$ CUDA_VISIBLE_DEVICES=7 python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset "VislocAachenDayNight('/home/bjangley/VPR/mast3r/datasets/aachenv11/', subscene='${scene}', pairsfile='fire_top50', topk=20)" --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /home/bjangley/VPR/mast3r/datasets/aachenv11/output/${scene}/loc --coarse_to_fine --max_batch_size 15 --c2f_crop_with_homography
100%|██████████████████████████████████████████████████████████████████████████████████████| 5232/5232 [00:00<00:00, 20947.26it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 6697/6697 [00:56<00:00, 119.31it/s]
100%|███████████████████████████████████████████████████████████████████████████████| 2324648/2324648 [00:11<00:00, 194997.94it/s]
  0%|                                                                                                     | 0/824 [00:00<?, ?it/s]/home/bjangley/VPR/mast3r/dust3r/dust3r/inference.py:44: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=bool(use_amp)):
/home/bjangley/VPR/mast3r/dust3r/dust3r/model.py:205: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=False):
/home/bjangley/VPR/mast3r/dust3r/dust3r/inference.py:48: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=False):
100%|███████████████████████████████████████████████████████████████████████████████████████| 824/824 [28:40:51<00:00, 125.31s/it]
VislocAachenDayNight('/home/bjangley/VPR/mast3r/datasets/aachenv11/', subscene='day', pairsfile='fire_top50', topk=20): 824 images - median_pos_error=np.float64(747.6884829190608), median_angular_error=np.float64(172.8515834669994)  - acc@0.1m,1deg=0.000  - acc@0.25m,2deg=0.000  - acc@0.5m,5deg=0.000  - acc@5m,10deg=0.000

I don't know if the error is due to an error in how my dataset is set up:

/mast3r/datasets/aachenv11
├── 3D-models
│   └── aachen_v_1_1
├── day_time_queries_with_intrinsics.txt
├── images
│   ├── db
│   ├── query
│   └── sequences
├── kapture
│   ├── mapping
│   ├── query
│   ├── query_day
│   └── query_night
├── mapping
│   └── colmap
├── night_time_queries_with_intrinsics.txt
├── output
│   ├── day
│   └── loc
└── pairsfile
    └── query

I followed the instructions on how to prepare the data here and downloaded the pairsfile using wget http://download.europe.naverlabs.com/kapture/Aachen_Day_Night_1_1_fire_top50_query_pairs.txt.

yocabon commented 3 weeks ago

Hi, the ground truth of Aachen isn't available so it's normal that the output says garbage.

you have to upload the '*_ltvl.txt' file to https://www.visuallocalization.net/ to evaluate it.

bakuljangley commented 3 weeks ago

@yocabon which dataset should I use then? I want to run some experiments to see how well I could use this approach to localise a camera.

I tried InLoc (issue was that the dataset is too large for me) and Aachen (the output being garbage).