kaspar030 / dwq

Disque Work Queue
GNU General Public License v3.0
0 stars 2 forks source link

Make even otherwise uncaught errors propagate out through dwq #4

Open chrysn opened 2 years ago

chrysn commented 2 years ago

Now with text:

This plays a part in finding the current quiet murdock failures, hoping to find the next iteration of them without having to resort to runner admins looking into their consoles.

If any "uncaught exception" error occurs, this takes about everything that looks like a current job (carefully peeking at locals because the exception could predate the acquisition of a job, in which case it's only console errors again for lack of a back propagation path) and subjects them to the usual check: If the haven't been NACKed often enough, they're just NACKed once more for retry (rather than, like so far, just staying unacknowledged forever and being sent to more workers, whereupon which the final worker either crashes before evaluating the NACKs or just hands it back unfinished before it's even crashing). If it was NACKed by enough workers (now, eg. by crashing them), the exception set as a result on the job.

As exceptions may also happen somewhere inbetween jobs, this is all not an exact science, so a warning is added above the backtrace that this was assigned to "any of the current jobs".

Security caveat:

This posts arbitrary tracebacks (no values, as Django's debug mode would do) up to the queue as results. I think we're fine with that as no confidential code is running anywhere here.

chrysn commented 2 years ago

Testing procedure that happened on my side:

chrysn commented 2 years ago

Squashed in my late fixup (in tests, errors showed to be ugly in %r and readable with %s); I hope it's usable that way already.