Closed gdamore closed 1 month ago
Attention: Patch coverage is 81.81818%
with 2 lines
in your changes are missing coverage. Please review.
Project coverage is 79.50%. Comparing base (
aac4dc3
) to head (a5f1a8f
).
Files | Patch % | Lines |
---|---|---|
src/sp/transport/tcp/tcp.c | 0.00% | 1 Missing :warning: |
src/sp/transport/tls/tls.c | 0.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @alzix - obviously I didn't get a chance to try building this on Windows myself (except via CI/CID) before I ran out of time. I'll fix these .. .but if you've already done so, can you tell me if the approach at least resolves the problems you've been observing?
no i still got a crash
For the sake of completeness I want to mention that the reqrep code provided by @alzix also sometimes crashes in nni_stat_inc
. It also looks like the code still hangs sometimes.
I also saw the crash in the nni_stat_inc
especially when running with a debugger and stopping threads for some time.
IMO we better concentrate on this critical issue first.
Statistics can be compiled out if it cause problems. @KnappSas - can you please open a different issue for it - so it won't get forgotten
HI @gdamore, I also tried this branch on a Windows machine and got the same crash:
It seems that when calling nni_list_node_remove(&aio->a_prov_node)
than a_prov_node
has the next&prev pointers point to free or uninitialized memory, I wasn't able to understand where is this node gets initialized/freed.
It is very easy to reproduce as it happens almost immediately when running the modified reqrep demo.
Hi @gdamore and all.
I've been looking into this and it seems as if the pipe is being deallocated/freed (in ipc_pipe_fini
) but after that a call to ipc_send_cb
is received (from win_io_handler
).
Inside ipc_send_cb
the nni_aio
pointer has the same address of the tx_aio
member of ipc_pipe
. Since tx_aio
is a by value member and since the the ipc_pipe
has been deallocated then this pointer is a dangling pointer.
Also worth mentioning that sometimes, not very often though there's no crash but a deadlock, where both client and server are waiting on signals.
THanks -- I've been looking at this in spare cycles, but very few such cycles exist, which is why it's taking time. I'll try to get to it in a day or so.
So the design is meant to ensure that all callbacks are done, and the pipe is totally unregistered before we freed it. I've missed something. I'm going to resolve this one way the other this weekend.
CLosing I merged a different branch with different approach.
This is a trial run, and it may or may not work.