microsoftarchive / redis

Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes
http://redis.io
Other
20.78k stars 5.37k forks source link

deadlock loading dll #538

Closed jeperk closed 7 years ago

jeperk commented 7 years ago

Any DLL that includes redis will have difficult to track down issues that present as dealocks or crashes.

The global static class Win32_FDSockMap uses loadlibrary to access winsock.

This static class is instantiated at dll load time where loadlibrary is unsafe to use. https://msdn.microsoft.com/en-us/library/dn633971(v=vs.85).aspx#general_best_practices "You should never perform the following tasks from DllMain: Call LoadLibrary or LoadLibraryEx (either directly or indirectly). This can cause a deadlock or a crash."

In addition, the constructor for this class also calls WSAStartup (still in DllMain): https://msdn.microsoft.com/en-us/library/windows/desktop/ms742213(v=vs.85).aspx "The WSAStartup function typically leads to protocol-specific helper DLLs being loaded. As a result, the WSAStartup function should not be called from the DllMain function in a application DLL. This can potentially cause deadlocks. For more information, please see the DLL Main Function."

jepickett commented 7 years ago

Why are you consuming Redis source in the creation of your DLL? Redis was developed assuming a process container. Changing that assumption may have consequences beyond that which you have outlined here.

jeperk commented 7 years ago

I'm not actively using it in the creation of my dll, but I am linked to the redis dll so naturally it will be loaded when my dll is loaded.

When the redis dll loads, it performs several actions that Microsoft has warned are unsafe and should never be done in DllMain.

They are not obvious because it's hidden inside a global rather than being explicitly called in DllMain. Globals are instantiated when the dll is loaded, causing that global's constructor to do things from a thread that the author didn't anticipate. It's the equivalent of doing those actions directly in the DllMain.

Here we have quite a few developers that experienced the issues caused by it. Depending on many timing factors, on some systems it caused a lock very rarely, on some systems it was >50% of the time.

I worked around the issue by modifying redis to not perform the actions above and added an initialize method that performs them later.

This worked 100% here but I wouldn't recommend committing this code because it's a bit of a hack.

jeperk commented 7 years ago

In Win32_FDAPI.cpp: // guarantee global initialization static class Win32_FDSockMap& init = Win32_FDSockMap::getInstance();

The constructor for the global Win32_FDSockMap contains: Win32_FDSockMap() { InitWinsock();

If you set a breakpoint in InitWinsock the callstack of the first hit should show you that you are effectively in the DllMain of redis at this point which is known to cause hard to find problems.

jepickett commented 7 years ago

You have described what you are doing, but not why you are doing it. Why load it onto your process space at all? For Redis to function well on Windows there are many heavy handed things required to simulate a Posix environment. By bringing Redis into your process space you might well be inheriting behavior detrimental to your own process. Lazy initialization is usually the better paradigm. In the interest in maintaining maximum code purity relative to mainline(antirez), this was not done in Redis for heap and socket inialization. Thus process container isolation is the next best alternative.

If you must load Redis into your process space, why is DLL load time linking preferable to dynamically loading later on? Is there anything you can't achieve with a LoadLibrary called from an initialization function called after your DLL is loaded?

jeperk commented 7 years ago

I can't really go into details because it's a commercial product, but my dll is just a thin wrapper that our watchdog process starts, stops and monitors.

LoadLibrary would still cause the same issue, calling calling WSAStartup from DLLMain.

It's the act of the library being opened at all that performs the unsafe actions.

Different methods of loading the DLL could change the frequency of the problem on a particular system, but that's like adding a Sleep(1) to avoid a race condition.

As this global's constructor is designed you can't completely avoid it, the DLL will simply not start up 100% reliably on all systems all the time.

My hack was to not call InitWinsock in the constructor of Win32_FDSockMap and instead provide a static method I could call once from my own constructor, of course that's a less user friendly solution and you have to call it exactly once.

In my case that was an acceptable workaround to the frequent lockups.

jepickett commented 7 years ago

I value the purity of this porting effort. Features were designed to be as orthogonal to the original code as possible, so as to not add unintended behaviors. If you call CreateProcess and hold onto the process handle, what is it that you are not able to do? If you can describe what you need to hook in Redis, perhaps that could be provided in a better way down the road.

jeperk commented 7 years ago

I think I finally see why we seem to be on two separate pages.

I'm not the person who integrated redis into our system, I was the one who determined the cause of the bug we were seeing.

I assumed dll was a standard build configuration for redis. Glancing over your repository just now, that doesn't seem to be the case. The dll configuration must have been made inhouse and I assumed it was a normal option for redis.

If it is correct that redis is not built as a dll then I'll discuss this with my coworkers and you can close this ticket, it's simply not applicable.

Thanks

jepickett commented 7 years ago

Just like the linux version, the Windows version Redis is compiled to a process container (exe) rather than as a linkable module(dll).