Closed benmandrew closed 4 months ago
I have been unable to create a minimal example using only opam lint
inside an Lwt_preemptive
thread — I suspect that there is some interaction with the OCurrent pipeline that is causing the issue.
If others see this issue, then I propose removing the preemptive threading. I'd rather correct code that's sometimes a little unresponsive.
I propose removing the preemptive threading. I'd rather correct code that's sometimes a little unresponsive.
I am in favor of this change. It would be nice if we could first test it out on some large input just to safety check whether it may have unexpectedly high cost. But from what I understand we have no really strong reason to think so.
It would be nice to know why this happens though. A debugger will show what all the threads are doing when it's stuck, which might be useful. If a thread is stuck in the kernel, /proc/PID/stack will show what it's doing (on Linux).
I'm unfamiliar with kernel-level stuff so that's a bit out of my scope for now.
Some added context for anyone who's investigating is that this issue does not seem to appear on Linux, only MacOS.
When running
opam-repo-ci-service
locally, once all of the jobs have finished the service goes into a state where the OCurrent graph does not render in the web-ui, and the process cannot be Ctrl-C'd. This may be related to https://github.com/ocurrent/opam-repo-ci/issues/293.The issue occurs because in the linting step, we call
OpamFileTools.lint
inside a pre-emptive thread.https://github.com/ocurrent/opam-repo-ci/blob/18a5dd1db585c0003ee538eaad09e206fc3049fe/lib/lint.ml#L300-L309
If I remove the threading as shown below, then I no longer see the issues on my machine.
The threading was introduced in This was introduced in https://github.com/ocurrent/opam-repo-ci/pull/98 to avoid 504s when checking "large PRs" (I assume this means PRs with many packages), as
OpamFileTools.lint
blocks the main thread until it finishes. Note thatLwt_preemptive
is also used for downloading theopam-repository
tree with no issue, so this may be a problem withopam lint
specifically.https://github.com/ocurrent/opam-repo-ci/blob/18a5dd1db585c0003ee538eaad09e206fc3049fe/lib/lint.ml#L101-L109
(I'm running on an ARM64 Mac.)