Closed jimmymathews closed 3 months ago
This also fixes a timing issue that in exceptional circumstances could make the worker containers remain idle, failing to catch the postgres NOTIFY
signals.
This also implements #331, since it is a minor generalization of the timeout already implemented here at the whole-feature level.
This PR does two things:
(1) I was not able to really verify that the check noted in #339 was "infinite", but it is still incorrect and surely related to failing jobs and the complex logic related to requesting that features get computed. This complexity is due mostly to the fact that the counts metric is the only one which is meant to return to clients without any "pending" flag, so the client does not have to poll. I cleaned up this logic a little bit and introduced a 5 minute timeout that clears a feature that seems to have no active jobs and is still incomplete (allowing that it might compute correctly after a new request in the future). I also reduced the number of database connections made by the workers by consolidation.
(2) The
ADIFeaturesUploader
is now only used in one place, but when the schema was changed slightly to use more autoincrementing identifiers, this one usage was not updated, leading to certain errors. This is now updated.