Open Legend-Master opened 1 year ago
@hamishcoleman May I ask why do we have such a long wait time that blocks exiting?
And I don't really think this is necessary, since we'll exit after this and stop reading anyway
https://github.com/ntop/n2n/pull/1111#issuecomment-1601315352
@hamishcoleman I added some logging and this is what I got, note that the time waiting for tun read thread is going to be much longer if you keep edge open for a while
And it did not go through Sleep(INFINITE)
at all
22/Jun/2023 10:12:03 [edge_utils.c:2597] [OK] edge <<< ================ >>> supernode
22/Jun/2023 10:12:03 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:03 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:03 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:13 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:13 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:13 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:13 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:23 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:23 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:23 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:23 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:23 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:23 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:33 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:33 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:43 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:43 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:43 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:43 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:43 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:43 [edge_utils.c:2939] start waiting for select
22/Jun/2023 10:12:46 [edge.c:977] shutting down...
22/Jun/2023 10:12:53 [edge_utils.c:2941] finish waiting for select
22/Jun/2023 10:12:53 [edge_utils.c:3061] start waiting for tun read thread
22/Jun/2023 10:13:06 [edge_utils.c:3063] finish waiting for tun read thread
22/Jun/2023 10:13:06 [edge_utils.c:2864] **********************************
22/Jun/2023 10:13:06 [edge_utils.c:2865] Packet stats:
22/Jun/2023 10:13:06 [edge_utils.c:2866] TX P2P: 0 pkts
22/Jun/2023 10:13:06 [edge_utils.c:2867] RX P2P: 0 pkts
22/Jun/2023 10:13:06 [edge_utils.c:2868] TX Supernode: 18 pkts (18 broadcast)
22/Jun/2023 10:13:06 [edge_utils.c:2869] RX Supernode: 0 pkts (0 broadcast)
22/Jun/2023 10:13:06 [edge_utils.c:2870] **********************************
I think the problem with select
is that, on Linux
https://man7.org/linux/man-pages/man7/signal.7.html
Interruption of system calls and library functions by signal handlers If a signal handler is invoked while a system call or library function call is blocked, then either:
the call is automatically restarted after the signal handler returns; or
the call fails with the error EINTR.
This is not the case on Windows
Yes, that is true, however the select timeout is set at 10 seconds, so it should not take 3-9 minutes.
Yes, that is true, however the select timeout is set at 10 seconds, so it should not take 3-9 minutes.
From that log, select
was taking exactly 10 seconds, but https://github.com/ntop/n2n/blob/6ecc201017729e3065905fe7ef085d2d1f49f06c/src/edge_utils.c#L3026-L3028 took 13
I think the extremely long time is caused by waiting tun read thread, it takes no time if you ctrl c immediately, but if you leave it running for a while, it will wait a long time
has this bug been fixed?
Not yet, see #1121
No work is currently being done on this repository.
However, instead of the large patch linked above, you could apply this one line patch https://github.com/hamishcoleman/n3n/commit/4e282a1c37704bcd05b465a00152cf4e3d1b11fe
There're 2 different blockers
Select not returning properly on Windows is also solvable with a couple of different one-line patches. One solution for select not returning properly is to simply wait 10 seconds on shutdown - as that is quite short compared to the potentially infinite timeout for a new tap packet to arrive (and a lot less invasive than adding a whole lot of windows specific threading code). Either way, no changes are being made to this repository
Do we wanna consider reduce the timeout on Windows as an alternative? Maybe like 1 second? Close window timeout is around 5 seconds, and if we can't clean up in that time, the whole program is terminated
Sorry, I'm not considering anything in this repository.
Alright
@Legend-Master , I've been thinking about the various "one-line" options I mentioned above and believe that the simplest is just to close one of the sockets that is included in the select() fdset - see https://github.com/hamishcoleman/n3n/commit/7427a712fb66dcd3bdf5ed1e743fa90253820f92 for the implementation I ended up using
That's a solution too, since we're shutting down anyway, it makes sense 😀
@Legend-Master , will this be fixed in the next release?
I think there're some CLA problems currently, and the next release won't be coming soon
You can use https://github.com/hamishcoleman/n3n for now though
Shutting down the edge will take a while most of the times
After doing some debugging, it seemed that
select
function is blocking us from shutting down (up to 10s when not waiting for server response)https://github.com/ntop/n2n/blob/709590d66e4aed880b6fc2338d516d20437c17ad/src/edge_utils.c#L2876