opensciencegrid / xrootd-lcmaps

Integration between Xrootd and LCMAPS.
Apache License 2.0
4 stars 11 forks source link

Use single shared mutex for LCMAPS calls from XrdLcmaps and XrdHttpLcmaps #16

Closed jthiltges closed 6 years ago

jthiltges commented 6 years ago

On a moderately busy xrootd system serving both xrootd and https protocols, we're seeing segfaults with varying code paths, but seeming to happen in this area:

xrootd -> xrootd-lcmaps -> openssl -> globus -> openssl

I'm suspecting that it may be due to contention between the XrdLcmaps and XrdHttpLcmaps callouts to LCMAPS. If two calls happen simultaneously, memory will get corrupted and/or double-free'd.

This patch changes the mutex protecting LCMAPS such that both protocols are protected with a single mutex.

edquist commented 6 years ago

So this is changing the mutex type from XrdSysMutex to std::mutex ?

We are replacing one call of guard(m_mutex) with guard(lcmaps_mutex) - is it going to cause any problems for the other places that still call guard(m_mutex) ?

Does the new code work & resolve the problem for you?

jthiltges commented 6 years ago

Dear Carl,

Thanks for looking over the patch and good questions!

It replaces the XrdSysMutex in XrdLcmaps.cc with a std::mutex, that's correct. Could go either way, but the newer XrdHttpLcmaps code used the std::mutex, so went that direction.

For the other uses of m_mutex, XrdHttpLcmaps.cc looks safe. The m_mutex locks only protect std::map m_map. In GlobusSupport.cc, I believe that m_mutex is a different variable.

The new code has worked fine and appears to resolve the issue for us. I've not been able to trigger the crashs, so it's been a matter of waiting for our local redirector to segfault. Without the patch, it was segfaulting several times per day. After the patch, it has been running smoothly. Sadly, xrootd under valgrind runs far too slowly for us to use in production.

Best regards, John

edquist commented 6 years ago

Ok, that sounds fine then. Glad to hear the fix is working for you!

The only other thing that got my attention is, the new lcmaps_mutex variable is put at the global namespace, rather than defining it as a static member of some class (like what's done here). But looking it over, there doesn't seem to be an obvious better place to define it as a static class member rather than at the global namespace, given that it's used in both XrdLcmaps.cc and XrdHttpLcmaps.cc. So off hand i don't have anything better to suggest. We'll just have to pay attention to names if more locks are needed in the future.

jthiltges commented 6 years ago

That's indeed the issue I saw, @edquist. Without the extern, linking fails with these errors: CMakeFiles/XrdLcmaps.dir/src/XrdHttpLcmaps.cc.o:(.bss+0xa0): multiple definition of `lcmaps_mutex'

I renamed the variable as requested.

bbockelm commented 6 years ago

Bah, you're right. I was thinking about C++ class static variables (which don't need the extern modifier).

Anyhow, I see a small unrelated cmake issue with newer versions of CMake. Will push a fix for that then merge this.

jthiltges commented 6 years ago

Thank you Brian and Carl!