Weird reproductibility error at inference

rpautrat / SuperPoint

Efficient neural feature detector and descriptor

MIT License

1.93k stars 424 forks source link

Weird reproductibility error at inference #332

Open leblond14u opened 2 weeks ago

leblond14u commented 2 weeks ago

Hi,

I'm using LightGlue with SuperPoint and I noticed a weird SuperPoint behavior. When extracted on the sacre_coeur data (with no maximum points) my points seems weird : Capture d’écran du 2024-11-08 09-56-13 When extracted with 2048 max points the points are looking a bit better but still ... This weird behavior is then leeding me to no matches between the two images ...

Does anybody ever encountered the same issue with the descriptor or know how to solve this issue ?

Thanks in advance, Best regards,

Hugo

rpautrat commented 2 weeks ago

Hi, are you sure that the SuperPoint is correctly initialized, i.e. with the right pre-trained weights loaded?

Do you get the same if you test it on other images?

leblond14u commented 2 weeks ago

Hi, Thanks for your answer. I got the same king of results for the easy DSC_0410 scenario :

My weights are loaded from the https://github.com/cvg/LightGlue/releases/download/v0.1_arxiv/superpoint_v1.pth repo. With still no matches to be found ...

GoroYeh-HRI commented 2 weeks ago

Hi, I'm running the match_features_demo.py using the pretrained model sp_v6. As instructed here: https://github.com/rpautrat/SuperPoint?tab=readme-ov-file#matching-features-demo-with-pretrained-weights

However, I got the second descriptor extracted from SuperPoint model with dimension (0, 256) which led to error and cannot be matched.

This is the pair of images being loaded. They are hpatches-sequences-release/i_pool/1.ppm and 6.ppm. Any idea why I'd get zero dimension descriptor from the darker image? Thanks.

leblond14u commented 1 week ago

Update : @rpautrat It seems my problem is specific to the inference on cuda devices. When running the superpoints on my cpu I get no issues at all to find good matches on the sacre_coeur scenario. My setup is :

Nvidia A5000
CUDA Version: 12.2
torch 2.1.2
torchvision 0.16.2

@GoroYeh-HRI Is your problem also related to gpu usage ?

Best,

Hugo

GoroYeh-HRI commented 1 week ago

@leblond14u Thanks for asking. My GPU setup is: NVIDIA RTX A5000 CUDA Driver Version: 12.5 nvcc -V: cuda 11.6 torch 2.0.1 torchvision 0.15.2

Not sure if this has something to do with the "no descriptor error" I met. @rpautrat do you have any idea?

rpautrat commented 1 week ago

@leblond14u, I tested the torch SuperPoint model with recent versions of CUDA/Torch (CUDA 12.6 and Torch 2.4.1), and the detections look normal for me. So I am not sure where your problem is coming from... All I can suggest is to try another set of CUDA/Torch versions and see if this helps to resolve the problem.

@GoroYeh-HRI, which tensorflow version are you using? This repo is using an old version (e.g. 1.12 recommended).

GoroYeh-HRI commented 1 week ago

@leblond14u, I tested the torch SuperPoint model with recent versions of CUDA/Torch (CUDA 12.6 and Torch 2.4.1), and the detections look normal for me. So I am not sure where your problem is coming from... All I can suggest is to try another set of CUDA/Torch versions and see if this helps to resolve the problem.

@GoroYeh-HRI, which tensorflow version are you using? This repo is using an old version (e.g. 1.12 recommended).

Thanks for the prompt reply! When I used 1.12 tensorflow, I got issue when training the MagicPoint. The issue is: I got loss=nan, precision=nan, recall=0.0. I read through the Github issues and still could not resolve this issue. That's why I assume this is due to the incompatibility between the tensorflow version and my CUDA driver version (12.5)

rpautrat commented 6 days ago

Yes, this is very much possible. Unfortunately, this repo is getting old and is only compatible with older versions of CUDA probably. Can you try with an earlier version?