iris-ua / iris_lama

LaMa - A Localization and Mapping library
BSD 3-Clause "New" or "Revised" License
336 stars 72 forks source link

[Discussion] Poor scalability by thread number in PF #6

Open facontidavide opened 5 years ago

facontidavide commented 5 years ago

This is just a brainstorming, not really an "issue". You don't need to "solve" it, it is just an open discussion between nerds :)

I noticed that the scalability of the PF Slam is quite poor with the number of threads.

For instance, moving from 4 threads to 8 increase performance only by 50%. Note that the profiler still say that we are using 100% of 8 CPU!

I do know that there isn't such a thing as perfect scalability, but in this case I think there "might" be a bottleneck somewhere.

I inspected the code and I couldn't find any mutex or potential false sharing, but of course I haven't done an exhaustive search.

eupedrosa commented 5 years ago

I have an image that can help the discussing: mt_speedup png-1 The number of particles is 30.

In my opinion there are a few things that can explain these behavior:

facontidavide commented 5 years ago

I have the feeling that it is mostly related to point 3, but I might be wrong.

Anyway, performance gain decrease rapidly above 4 threads