zeromq / czmq

High-level C binding for ØMQ
czmq.zeromq.org
Mozilla Public License 2.0
1.16k stars 523 forks source link

zpoller_wait() will fail if WiFi networks are switched or after Windows sleep and even rebuilding all sockets doesn't make it recover #2276

Open devlpr opened 7 months ago

devlpr commented 7 months ago

I'm currently trying to figure out why zpoller_wait() fails after either Windows going to sleep and wakes back up or when switching WiFi networks between two available ones.

Context: Windows 11, C++, czmq 4.3.3 and 4.3.5, using curve encryption, user facing UI application

The application works perfectly until the machine goes to sleep or the WiFi network is switched to a different one. The issue manifests as zpoller_wait() not returning a valid socket. Restarting the application makes it recover, so I know the issue is not on the server side.

I have tried destroying the pollers, sockets, and every other zmq construct in the client and nothing allows it to recover until a full application restart. Is there some latent state somewhere holding onto the old system resources? If the resources could be re-enumerated to refresh that state it seems like it would work.

I've been running it through the debugger but it isn't clear to me what is happening yet.

sphaero commented 4 months ago

Windows is a bit of second class citizen I'm afraid. We just don't have much Windows devs I guess. I'm willing to have a look as well but I only have windows build hosts which never sleep. Do you have a code example showing the bug?

devlpr commented 4 months ago

I'll see if I can create one. I'm 95% on Linux as well. Windows is not my favorite, but people want Windows software still so I need to support the use-case.

Asmod4n commented 3 months ago

This is apparently how libuv does it: https://github.com/libuv/libuv/blob/master/src/win/detect-wakeup.c

sphaero commented 3 months ago

Thanks for the ref! Seems only relevant for Win8 and later. Docs here: https://learn.microsoft.com/en-us/windows/win32/api/powerbase/nf-powerbase-powerregistersuspendresumenotification

devlpr commented 3 months ago

Any idea what code would need to be registered there? If I know what needs to be added I can do the PR. I just don't know where the code is that does the enumeration on Windows. Any information is appreciated. Happy to contribute if I understand what to do.

devlpr commented 3 months ago

I see some uses of GetAdaptersAddresses in libzmq/src/ip_resolver.cpp and czmq/src/ziflist.c. The use in ziflist.c seems most pertinent with the comment "// Helper to reload network interfaces from system". That seems to be what needs to happen here. Can anybody confirm that as the correct function to run on wake-up? If so, the next question will be what list to update with it.

devlpr commented 3 months ago

On second thought, it seems that the ip_resolver.cpp usage ends up getting called in zpoller.c in s_rebuild_poll_set(). That's in the code path of zpoller_wait, which may make more sense.

sphaero commented 3 months ago

You also could look into how zloop rebuilds the pollset if it is of any relevance.

https://github.com/zeromq/czmq/blob/db940448315b71ceb6099fce6d73918a94daa98a/src/zloop.c#L244

However this might be one that needs to be done in libzmq as the sockets might need to be recreated?

sphaero commented 2 months ago

I think this app hooks into the power events as well:

https://github.com/RuiVarela/Senos/blob/682d171ea11a1670cc65408c5b682e7dc8719feb/app/Platform_win.cpp#L192