Open dsavchenko opened 1 year ago
Email sending was made by @burnout87 , could you please help here?
By the way, to be clear, I think what you mean here is what we called async backends - when backends send updates to dispatcher instead of responding in the first call.
Async dispatcher is another feature which we do not usually use - it's when dispatche queues requests in celery and responds to api client with "submitted" when it's treating it.
Or I misunderstand?
By the way, in these terms, do we not generally use async backend for nb2workflow? It's showing "submitted" and all that.
Thanks, I would also need to have some more details regarding the setup which is causing you the issue.
call_back
endpoint ?My guess is that is something to do with your setup that triggers such behavior but I'd be curios to see it live to understand even more what to investigate.
My guess is that is something to do with your setup that triggers such behavior but I'd be curios to see it live to understand even more what to investigate.
One way or another the issue must be that the operation to send email is synchronous, blocking.
It may not be desirable to change this right now since it might be complex (introduce async def throughout dispatcher?.. subprocess or thread?..) and since during callback this is not probably visible to the user.
Does the rest the the treatment complete though? The record about completion, for example, should be produced nevertheless, is it?
It's my staging setup in k8s
The dispatcher there can't send email for some reason (I use a mail server of the lab, get there an authentication failure, probably it's configuration issue). So, as dispatcher is running as only one pod and one synchronous process, it gets blocked for the time it tries to send the email. After limiting number of tries, it gives up and unblocks. But for the frontend this leads to dispatcher response timeout first, and unresponsive dispatcher with 503 error for some time then.
Everything else looks good.
The dispatcher there can't send email for some reason (I use a mail server of the lab, get there an authentication failure, probably it's configuration issue). So, as dispatcher is running as only one pod and one synchronous process, it gets blocked for the time it tries to send the email. After limiting number of tries, it gives up and unblocks. But for the frontend this leads to dispatcher response timeout first, and unresponsive dispatcher with 503 error for some time then.
don't you run with gunicorn multiple workers?
don't you run with gunicorn multiple workers?
Really. I didn't think about it. I installed it with the chart, and using gunicorn seems to be not the default.
But it helps with 503. Initial gateway timeout is still there.
It's my staging setup in k8s
The dispatcher there can't send email for some reason (I use a mail server of the lab, get there an authentication failure, probably it's configuration issue). So, as dispatcher is running as only one pod and one synchronous process, it gets blocked for the time it tries to send the email. After limiting number of tries, it gives up and unblocks. But for the frontend this leads to dispatcher response timeout first, and unresponsive dispatcher with 503 error for some time then.
There is a time.sleep
of half a second before a new email sending attempt, that probably causes the issue.
Half a second can't cause a problem, of course. But the attempt to authenticate to the mail server itself takes time if credentials are bad and timeout there is lot longer. So several attempts lead to timeout.
I don't know what to do with it, frankly speaking
Half a second can't cause a problem, of course. But the attempt to authenticate to the mail server itself takes time if credentials are bad and timeout there is lot longer. So several attempts lead to timeout.
I don't know what to do with it, frankly speaking
A good way could be to put email in some queue and process it as available. But that's effort. A more generic way is queue of requests to dispatcher - that's "async dispatcher" but it also causes other issues like unclear request prioritization.
Here I'd suggest to just make sure that, when this happens during callback, all that's necessary (making the "done" record) is done before the email is sent, so that only email fails. If it's so, it's not a huge problem.
Sync request will not work, but then we do not send email anyway, right?
Half a second can't cause a problem, of course. But the attempt to authenticate to the mail server itself takes time if credentials are bad and timeout there is lot longer. So several attempts lead to timeout.
I don't know what to do with it, frankly speaking
Ok that probably is.
Here I'd suggest to just make sure that, when this happens during callback, all that's necessary (making the "done" record) is done before the email is sent, so that only email fails. If it's so, it's not a huge problem.
Ok, that probably is not a big change to be applied, the status should be written before the email business, potentially causing the issue, happens.
Sync request will not work, but then we do not send email anyway, right?
In case of status failed
? An email is usually sent. But yes, in case the sync request fails no email is sent.
At least with synchronous dispatcher installation this will cause timeout in frontend and then 503 until all attempts to send email exhausted
Can it be fixed, or we don't need to and async dispatcher is an option?