Getting zero matches for LATCH_UNSIGNED + GPUBRUTEFORCEHAMMING & for LATCH_BINARY + BRUTEFORCEHAMMING

robeastham commented 7 years ago

So ComputeFeatures seems to be running okay, though not great. Results from ComputeFeatures when using LATCH_UNSIGNED seem better than LATCH_BINARY. That is to say that most of the files sizes of .desc, on my images, seem about the same as equivalent when run through SIFT (avg about 3mb per file - I am using .png to mask each 18MP image - so about 3/4 of the frame is masked - 96 images in total). Sadly the .feat files seem smaller than their equivalents when run through SIFT by about 30%. Though some outliers have more than SIFT and others significantly less.

When I run the ComputeMatches using either of GPUBRUTEFORCEHAMMING or BRUTEFORCEHAMMING against the either LATCH_UNSIGNED or LATCH_BINARY, with ratio of 0.8 and also 0.99, it completes without error in about 2 seconds and I get 1k match files for all new files created after this step. So no matches at all. This seems like something is off, given that the ComputeFeatures stage did come up with data similar to the SIFT results. Perhaps I've compiled something wrong, even though ComputeMatches seems to run without error.

I'm using a turntable setup and so perhaps that's the problem - not sure if LATCH is even a good match for situations like this where the camera is fixed and subject rotates?

I have tried PNNET too and ended up with significantly larger .desc files in the ComputeFeatures stage, but the smallest .feat files when compared to SIFT and LATCH_UNSIGNED. I also got the 1k matches file behaviours with PNNET. SIFT works fine and gives me a sparse cloud that rivals commercial photogrammetry software.

Any pointers you can provide would be great. I'd love to get the speed of a LATCH based solution.

mdaiter commented 7 years ago

Ah, okay. Let me explain (and thank you so much for pointing this out, as there has been some confusion for a while about which matching version to use).

LATCH_BINARY needs to be used with the matching method GPU_LATCH. The reason behind this is due to @csp256's and @komrad36's matching kernel being specifically tuned for operating on 512-bit-long data. The matching kernel should easily work and return positive results then.

LATCH_UNSIGNED stores data from LATCH's method in 16 32-bit-long integers; therefore, this must be used with GPUBRUTEFORCEHAMMING. However, I strongly advise running LATCH with LATCH_BINARY with the custom GPU_LATCH matching method.

LATCH should be a good match for these scenarios. I don't know the dataset on which you're operating, but I would take into account that LATCH is (for now) coupled with detecting FAST keypoints using OpenCV's CUDA library.

PNNET should return the smallest .feat files: it (and all other deep descriptors) uses the TBMR detection method to detect keypoints to describe.

So to summarize, I'd suggest working with LATCH_BINARY for your feature detection method and GPU_LATCH for your matching method.

Please keep me updated! Curious to hear the results.

mdaiter commented 7 years ago

@robeastham if you pull now, the typos in the matching binary should be fixed. Please keep me updated on how your experiments go!

robeastham commented 7 years ago

Thanks for the detailed response. All makes sense. Does the last commit mean that there was a bug stopping things running properly for LATCH_BINARY whatever I was doing?

I will pull, compile and try. Hopefully later today.

I'm also wondering if I need to tweak my CMakeLists.txt for LATCH, because I'm 99.99% sure I tested LATCH_UNSIGNED with GPUBRUTEFORCEHAMMING, which looks like it should have worked, and got 1k results as described in my OP. I have a GTX 970 and I had already changed CMakeLists.txt prior to compiling to:

-gencode arch=compute_30,code=sm_30

I think I read somewhere else that this is no good for the 970 GTX and that I need to replace the 30 with either 50 or 52. I wonder if there's any way to make this dynamic to support multiple cards as I might run my Dockerfile on one of two machines with different generations of NVIDIA (970 GTX on my server and 1060 GTX on my laptop). Possibly others too if I use elsewhere.

I'll try to go experiment later today, though feel free to point me in the right direction if you already know the answer to which compute** and sm** code I should use for each architecture :-)

mdaiter commented 7 years ago

That last commit just changed the formatting in the output message of the matching binary. There were no technical bugs; just bugs in terms of UI/UX.

Don't know the gencode you should use :(

giacomodabisias commented 7 years ago

Hello to everyone. I am also experimenting with the new LATCH descriptor but for now I am having some issues. As soon as I get some results I will keep you posted. @robeastham you can use compute capability 52 for your card. Have a look here: https://en.wikipedia.org/wiki/CUDA

robeastham commented 7 years ago

@giacomodabisias thanks for the clarification. I actually did a recompile with 52 yesterday and still failed to get anything at CoputeMatches stage. Even when following recent instructions from @mdaiter.

I realised that the Docker image I'm deriving from actually uses OpenCV 3.2 and so perhaps that is my issue. Think I might need to retry with OpenCV 3.1 - @mdaiter do you think 3.2 could be the problem?

giacomodabisias commented 7 years ago

@robeastham compute capabilities won't make any difference. They only allow the compiler to use a wider instruction set with optimised code for your specific card.

@mdaiter For now I am stuck after computing features and matches when I want to execute openMVG_main_GlobalSfM. Just to be clear, I did the following (folders are omitted to make it readable):

openMVG_main_SfMInit_ImageListing -i input_dir -o matches_dir -d camera_file_params -f 430

openMVG_main_ComputeFeatures -i matches_dir/sfm_data.json -o matches_dir -m LATCH_BINARY -p ULTRA --numThreads 56

openMVG_main_ComputeMatches -i matches_dir/sfm_data.json -o matches_dir -g e -v 5 -n GPU_LATCH

openMVG_main_GlobalSfM-i matches_dir/sfm_data.json -m matches_dir -o reconstruction_dir -f ADJUST_ALL

Does this look fine?

mdaiter commented 7 years ago

@robeastham not sure as to what capabilities were introduced into 3.2. What dataset (if available) are you using?

@giacomodabisias those commands look good. Are you sure you can use all of those threads within your comptueFeatures phase? The reason I ask is because your GPU might not be able to handle all of those queued commands at once. I'm using a stream-based multi-threaded solution for running CLATCH on the GPU and haven't tested running more than 8 threads at once, although I'm pretty sure 16 threads is the max.

Also have you added -f 1 to all of your commands when re-running to ensure your commands are forced to recompute everything?

giacomodabisias commented 7 years ago

@mdaiter Thanks for the answer. Yes I got a 56 core machine with 2 P5100 Nvidia cards so there should be no issue with processing power.

giacomodabisias commented 7 years ago

Hi @mdaiter, I started again working on it and I am stuck at the same point as before. I get

CleanGraph_KeepLargestBiEdge_Nodes():: => connected Component: 0

Cardinal of nodes: 0
Cardinal of edges: 0

during openMVG_main_GlobalSfM. Everything looks good. Descriptors and matches are there and all files look correct . Any idea?

pmoulon commented 7 years ago

@giacomodabisias If you have graphviz on your computer. Some graph files must be exported (svg) by ComputeMatches. It can help you to see the connected components once the geometric matches have been performed.

mdaiter / openMVG

Getting zero matches for LATCH_UNSIGNED + GPUBRUTEFORCEHAMMING & for LATCH_BINARY + BRUTEFORCEHAMMING #15