Closed jthiltges closed 6 years ago
So this is changing the mutex type from XrdSysMutex
to std::mutex
?
We are replacing one call of guard(m_mutex)
with guard(lcmaps_mutex)
- is it going to cause any problems for the other places that still call guard(m_mutex)
?
Does the new code work & resolve the problem for you?
Dear Carl,
Thanks for looking over the patch and good questions!
It replaces the XrdSysMutex
in XrdLcmaps.cc with a std::mutex
, that's correct. Could go either way, but the newer XrdHttpLcmaps code used the std::mutex
, so went that direction.
For the other uses of m_mutex
, XrdHttpLcmaps.cc looks safe. The m_mutex
locks only protect std::map m_map
. In GlobusSupport.cc, I believe that m_mutex
is a different variable.
The new code has worked fine and appears to resolve the issue for us. I've not been able to trigger the crashs, so it's been a matter of waiting for our local redirector to segfault. Without the patch, it was segfaulting several times per day. After the patch, it has been running smoothly. Sadly, xrootd under valgrind runs far too slowly for us to use in production.
Best regards, John
Ok, that sounds fine then. Glad to hear the fix is working for you!
The only other thing that got my attention is, the new lcmaps_mutex
variable is put at the global namespace, rather than defining it as a static member of some class (like what's done here). But looking it over, there doesn't seem to be an obvious better place to define it as a static class member rather than at the global namespace, given that it's used in both XrdLcmaps.cc
and XrdHttpLcmaps.cc
. So off hand i don't have anything better to suggest. We'll just have to pay attention to names if more locks are needed in the future.
That's indeed the issue I saw, @edquist. Without the extern
, linking fails with these errors:
CMakeFiles/XrdLcmaps.dir/src/XrdHttpLcmaps.cc.o:(.bss+0xa0): multiple definition of `lcmaps_mutex'
I renamed the variable as requested.
Bah, you're right. I was thinking about C++ class static variables (which don't need the extern
modifier).
Anyhow, I see a small unrelated cmake issue with newer versions of CMake. Will push a fix for that then merge this.
Thank you Brian and Carl!
On a moderately busy xrootd system serving both xrootd and https protocols, we're seeing segfaults with varying code paths, but seeming to happen in this area:
I'm suspecting that it may be due to contention between the XrdLcmaps and XrdHttpLcmaps callouts to LCMAPS. If two calls happen simultaneously, memory will get corrupted and/or double-free'd.
This patch changes the mutex protecting LCMAPS such that both protocols are protected with a single mutex.