Open skalien opened 3 months ago
Hi, thanks for opening this issue. This indeed looks like a bug. I replaced the assert with clipping so that it doesn't crash. This is more a workaround rather than an actual fix. The code we pushed for the sparse global alignment is not yet final. We hope to release a cleaner version eventually.
Code goes through without error. But the results are not good. I tried it on a dataset on which COLMAP works, but with your revised code does not give as good of an output as COLMAP does. Looking at the losses, it seems that the optimization is stuck on some local minima. Is it possible to load COLMAP poses_bounds.npy file as initialization point for the optimization? or, can you take a look at the log and tell me if there are better hyperparameters that I should try? Thanks for your immense help.
Log:
~/mast3r main CUDA_VISIBLE_DEVICES=1 python demo.py \
--weights checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth \
--local_network \
--server_port 6006 \ls
usage: mast3r demo [-h] [--local_network | --server_name SERVER_NAME] [--image_size {512,224}] [--server_port SERVER_PORT]
(--weights WEIGHTS | --model_name {MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric}) [--device DEVICE] [--tmp_dir TMP_DIR]
[--silent] [--share]
mast3r demo: error: unrecognized arguments: ls
~/mast3r main CUDA_VISIBLE_DEVICES=1 python demo.py \
--weights checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth \
--local_network \
--server_port 6006
... loading model from checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth
instantiating : AsymmetricMASt3R(enc_depth=24, dec_depth=12, enc_embed_dim=1024, dec_embed_dim=768, enc_num_heads=16, dec_num_heads=12, pos_embed='RoPE100',img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), patch_embed_cls='PatchEmbedDust3R', two_confs=True, desc_conf_mode=('exp', 0, inf), landscape_only=False)
<All keys matched successfully>
Outputing stuff in /tmp/tmpc146n_52_mast3r_gradio_demo
Running on local URL: http://0.0.0.0:6006
To create a public link, set `share=True` in `launch()`.
>> Loading a list of 20 images
- adding /tmp/gradio/d4a5cebb09915dbb77ff519a92697e334c771599/image010.goosegro_camera1A.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/fa754af3da50d67cf4af5d0235170d0e0abfd1ca/image010.goosegro_camera1B.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/9661e529bd120d6fb303e0e91fab23c58b27fcd0/image010.goosegro_camera1C.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/dfe36d0e9f2a9e623f1a6fcda2a9d3793da2ed18/image010.goosegro_camera1D.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/fb80a28c1fc4143591d25c3fe380f5df253d3dbb/image010.goosegro_camera1E.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/e56e37f05aacad581f29b5c77cc7070219fc843e/image010.goosegro_camera1F.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/e3dc41b07aabd85a36c9cac8a1034669c59d7c6b/image010.goosegro_camera1G.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/0e92e9ce82138ef5c330c7a23d51c8730c2a68fe/image010.goosegro_camera1H.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/c75014d2951085fd734c00e222c22d325b2d8eba/image010.goosegro_camera1I.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/3ffc1f0be19d8c24391fe54b5015f80c3d492a50/image010.goosegro_camera1J.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/99bfc8e86700bae693de3732e3f879d2e289b30a/image010.goosegro_camera2A.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/7c4728d3c6056794f6cf562361c091b0019cfaa1/image010.goosegro_camera2B.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/b58d7772e8ae8bed85cf80cbbb2f98f9611385af/image010.goosegro_camera2C.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/a15b74990e7d2fd34d3347c36943e93e569e508f/image010.goosegro_camera2D.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/158450a4d92aa85ca6a7209a167130228f8f0e69/image010.goosegro_camera2E.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/ba5be2730c62d4587bcb0278b61b63ebb501b579/image010.goosegro_camera2F.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/2e125d1c23ceb22fb1b285cfa0c5b723403e3e32/image010.goosegro_camera2G.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/9050a8f0cdbf83133fb630d01db1d043f8e8369a/image010.goosegro_camera2H.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/488986c82d5009db29572ea76140384dae4a8cf1/image010.goosegro_camera2I.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
- adding /tmp/gradio/b3118dd2dc5299af395391388897df080d9685b3/image010.goosegro_camera2J.aovs.09_06.DENOISE.001.jpg with resolution 1536x960 --> 512x320
(Found 20 images)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 380/380 [01:11<00:00, 5.29it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00, 4.76it/s]
init focals = [221.70247 221.70247 221.70247 337.40414 360.13602 328.17224 437.2105
473.91724 347.58273 221.70247 221.70247 251.29607 221.70247 221.70247
258.71204 221.70247 221.70247 221.70247 376.61337 221.70247]
100%|█████████████████████████████████████████████████████████████████████████████████████| 500/500 [01:03<00:00, 7.91it/s, lr=0.0000, loss=0.161]
>> final loss = 0.16088540852069855
100%|█████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:24<00:00, 8.07it/s, lr=0.0000, loss=1.538]
>> final loss = 1.5383764505386353
Final focals = [230.36617 236.69899 234.3281 260.5374 216.10912 244.1593 269.19937
316.50726 300.4955 272.22772 241.74031 250.68259 245.31836 231.41617
210.68994 250.09065 278.99652 283.1394 288.26767 286.3053 ]
(exporting 3D scene to /tmp/tmpc146n_52_mast3r_gradio_demo/scene.glb )
Right now, the sparse global alignement is hit or miss, we are still working on it. You can try putting matching_conf_thr to 0, it was meant to help with images that have 0 overlap but end up hurting the other scenarios. If your images are too close to one another, it can also cause issues.
Right now, the sparse global alignement is hit or miss, we are still working on it. You can try putting matching_conf_thr to 0, it was meant to help with images that have 0 overlap but end up hurting the other scenarios. If your images are too close to one another, it can also cause issues.
Hi, I have a question on the sparse global alignment. It seems that the optimization utilizes the 2D-2D correspondence to minimize the coarse 3D-3D loss and 2D-3D projection loss. The selected correspondence points are anchored to the "core depth" (which is uniformly sampled from the image to my understanding). But why not optimise those selected correspondence points directly? Why to optimize core depth?
Hi, I am getting this
assertion
error. Can you please tell me what this error means and is this a bug in the program or something else? Thanks.