Open burnout87 opened 12 months ago
its a duplicate of https://github.com/oda-hub/dispatcher-plugin-nb2workflow/issues/72
Actually I keep it open to confirm it's a duplicate
This is not a duplicate, tests for both plugins were failing. The complication is that it's not always reproducible. And also it seems that the causes (or at least the manifestations in the logs) were probably different.
I think the same error appeared again:
https://github.com/oda-hub/dispatcher-app/actions/runs/6484236816/job/17607602126
Yes, it's the same error. And I can't reproduce it locally. I see lots of timeouts while polling dispather with oda-api in test_full_stack
The dispatcher is functional, though, and replies correctly, but after the request is timed out.
We discussed a bit that we want to profile and reduce latency if possible. But for the time being, as it only appears in pipeline, I propose https://github.com/oda-hub/dispatcher-plugin-nb2workflow/pull/73
I re-run the workflow, and this time it completed
Still, let's keep this issue open for some time
I think I encountered again the same issue, is it related? A TimeoutError
is mentioned.
https://github.com/oda-hub/dispatcher-app/actions/runs/7087471000/job/19287806505?pr=626
That's a different kind of timeout
TimeoutError: The provided start pattern Serving Flask app could not be matched within the specified time interval of 30 seconds
It's related to the live_nb2service
fixture which starts nb2service
as a separate process via xprocess
lib and waits to 'Serving Flask app' in the stdout of it
Didn't we have changes to the nb2workflow which could e.g. affect the verbosity?
This xprocess sometimes used to cause issues when debugging tests locally: unclearly terminated test can leave the process, leading to the impossibility to start another one because of the port being used. But in CI it always worked well. I will investigate further.
It was a transient issue. Not sure what was the cause. I wasn't able to reproduce it locally. Then I restarted the CI job and it passed.
ok, it did the same for me
It would be better to have some better process starting/tracking behavior to avoid this. Though I suspect that this particular issue will not happen in production since if the service is not starting, the pod will be recreated.
Still, let's keep for tracking open.
Though I suspect that this particular issue will not happen in production since if the service is not starting, the pod will be recreated.
Exactly, this mechanism is only used in tests
Though I suspect that this particular issue will not happen in production since if the service is not starting, the pod will be recreated.
Exactly, this mechanism is only used in tests
Well, it might be that the server is not starting for some reason. It is then an issue for nb2workflow itself. Port already used is a common issue indeed, in dispatcher I made some custom xprocess analog which tries to deal with this. I think we might be able to adapt xprocess to behave better, but let's leave it for now.
looks like it happened again:
https://github.com/oda-hub/dispatcher-app/actions/runs/7130587275/job/19417305982
looks like it happened again:
https://github.com/oda-hub/dispatcher-app/actions/runs/7130587275/job/19417305982
production was down for some 15min, is something there calling it?
Now I noticed it, I noticed some crashes elsewhre
During the development of https://github.com/oda-hub/dispatcher-app/pull/585 , the following workflow failed:
https://github.com/oda-hub/dispatcher-app/actions/runs/6149352313/job/16685039350
Please feel free to move the issue somewhere else more suitable if needed