Queue jobs until remote services available

bocekm commented 4 years ago

(edited by @lachmanfrantisek )

To make Packit more reliable, it should retry the jobs where possible. We can easily react to a specific exception and put the job back in the celery queue. With this, users do not need to retrigger the job in case of infra problems.

Few notes for the implementation:

Be careful if the job is idempotent and can be safely retried.
Set the good maximum value for retries and provide a meaningful message after the last failed try.
Be transparent about the retries -- think about a nice way how to let users know that Packit retried the job. Don't spam, but make it somehow visible so it does not feel like nothing is happening.
We already have auto-retry in work for all the tasks: https://github.com/packit/packit-service/blob/307974f7bba1f349d5b0f9d0c0d3eaa6e988d2b4/packit_service/worker/tasks.py#L60-L65

Here is a list of the specific tasks:

[x] #1548
[x] #1570
[x] #1571
[x] #1549
[x] #1550
[x] https://github.com/packit/packit/issues/1656
[x] #1551
[x] #1692

Original description:

Upon merging to the main branch of https://github.com/oamg/convert2rhel/, a copr build job is triggered.

Since today, the copr build is not being created with packit reporting: Submit of the build failed: Unable to connect to https://copr.fedorainfracloud.org/api_3/.

bocekm commented 4 years ago

Oh, it seems to be due to https://copr.fedorainfracloud.org/ being down.

TomasTomecek commented 4 years ago

@bocekm thanks for taking time opening this issue!

Is there something we could do here which would make matters better? One idea we had back then was to keep jobs in queue (by job I mean the fact you pushed to a branch) and schedule them (initiate the build) once the remote service becomes available. Would such a thing be interesting for you or is it just better the trigger builds once copr becomes available again?

sentry-io[bot] commented 4 years ago

Sentry issue: RED-HAT-0P-2YQ

sentry-io[bot] commented 4 years ago

Sentry issue: RED-HAT-0P-2YP

bocekm commented 4 years ago

The queue sounds like a good idea and I'd appreciate it, even though it is more like a nice-to-have feature.

stale[bot] commented 3 years ago

This issue has been marked as stale because it hasn't seen any activity for the last 60 days.

Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.

This is done in order to ensure that open issues are still relevant.

Thank you for your contribution! :unicorn: :rocket: :robot:

(Note: issues labeled with pinned or EPIC are never marked as stale.)

lachmanfrantisek commented 3 years ago

@packit/the-packit-team do you think this is worth the time spent on this?

TomasTomecek commented 3 years ago

@packit/the-packit-team do you think this is worth the time spent on this?

afaik we have a dedicated issue for that, right?

lachmanfrantisek commented 3 years ago

@TomasTomecek I know that there are several related, but which one do you mean specificaly?

(What about having one synced from upstream?)

TomasTomecek commented 3 years ago

https://github.com/packit/packit-service/issues/830

lachmanfrantisek commented 3 years ago

That one is only about GitHub errors...

stale[bot] commented 3 years ago

This issue has been marked as stale because it hasn't seen any activity for the last 60 days.

Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.

This is done in order to ensure that open issues are still relevant.

Thank you for your contribution! :unicorn: :rocket: :robot:

(Note: issues labeled with pinned or EPIC are never marked as stale.)

lachmanfrantisek commented 3 years ago

This is still on our long-term todo list.

stale[bot] commented 3 years ago

This issue has been marked as stale because it hasn't seen any activity for the last 60 days.

Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.

This is done in order to ensure that open issues are still relevant.

Thank you for your contribution! :unicorn: :rocket: :robot:

(Note: issues labeled with pinned or EPIC are never marked as stale.)

bocekm commented 3 years ago

@TomasTomecek, @lachmanfrantisek, feel free to change the title of the issue to something like "Queue jobs until remote services available" and/or edit the description to include Tomas' idea "keep jobs in queue (by job I mean the fact you pushed to a branch) and schedule them (initiate the build) once the remote service becomes available."

Or close this one and create a new one. Up to you.

stale[bot] commented 3 years ago

This issue has been marked as stale because it hasn't seen any activity for the last 60 days.

Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.

This is done in order to ensure that open issues are still relevant.

Thank you for your contribution! :unicorn: :rocket: :robot:

(Note: issues labeled with pinned or EPIC are never marked as stale.)

stale[bot] commented 2 years ago

This issue has been marked as stale because it hasn't seen any activity for the last 60 days.

Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.

This is done in order to ensure that open issues are still relevant.

Thank you for your contribution! :unicorn: :rocket: :robot:

(Note: issues labeled with pinned or EPIC are never marked as stale.)

stale[bot] commented 2 years ago

This issue has been marked as stale because it hasn't seen any activity for the last 60 days.

Stale issues are closed after 14 days, unless the label is removed by a maintainer or someone comments on it.

This is done in order to ensure that open issues are still relevant.

Thank you for your contribution! :unicorn: :rocket: :robot:

(Note: issues labeled with pinned or EPIC are never marked as stale.)

lachmanfrantisek commented 2 years ago

I've made an EPIC from this issue and created a specific issue for each job/error type.

lbarcziova commented 1 year ago

All the subtasks here were implemented.

packit / packit-service

Queue jobs until remote services available #927

Original description: