This test happens when NodeJS tries to exit but there's async handles waiting to be executed, the behavior is strange because it is working as expected and suddently it enters into a livelock, blocking the process at 100% of CPU. It gets stuck at uv__async_io.
It is possible to stop and debug with GDB if you run the container with:
docker run --rm --privileged -it metacall/core:dev bash
The destroy hook for handling the handle counter seems to be what triggers this. But I am not sure if this happens due to abusing the async handles or because of the destroy mechanism itself.
Two options to improve this are:
1) Create a better way of handling async calls that does not create one async resource for each async call. We could implement this with a queue or similar and handle all the async calls with only one async resource. This may also improve the performance.
2) Debug in detail what is exactly being triggered from the destroy mechanism, and review the check or prepare hooks, then use only one that does not produce an interference with the async step in the libuv event loop.
Here are the logs of two runs of the livelock issue:
🐛 Bug Report
This bug is being produced for no clear reason. For reproducing it, I have to run in parallel multiple tests with:
ctest --repeat until-fail:10000 -VV -R metacall-node-python-await-extended-test
This test happens when NodeJS tries to exit but there's async handles waiting to be executed, the behavior is strange because it is working as expected and suddently it enters into a livelock, blocking the process at 100% of CPU. It gets stuck at
uv__async_io
.It is possible to stop and debug with GDB if you run the container with:
docker run --rm --privileged -it metacall/core:dev bash
The destroy hook for handling the handle counter seems to be what triggers this. But I am not sure if this happens due to abusing the async handles or because of the destroy mechanism itself.
Two options to improve this are: 1) Create a better way of handling async calls that does not create one async resource for each async call. We could implement this with a queue or similar and handle all the async calls with only one async resource. This may also improve the performance. 2) Debug in detail what is exactly being triggered from the destroy mechanism, and review the
check
orprepare
hooks, then use only one that does not produce an interference with the async step in the libuv event loop.Here are the logs of two runs of the livelock issue:
And the other:
I am going to comment out the test until we do better debugging for this because we need to tests passing for other tasks.