tektoncd / results

Long term storage of execution results.
Apache License 2.0
77 stars 73 forks source link

[WIP] finalizer approach to fix race condition due to pruning in results watcher #703

Open ramessesii2 opened 7 months ago

ramessesii2 commented 7 months ago

Changes

/kind bug

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you review them:

Release Notes

NONE
tekton-robot commented 7 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign sayan-biswas after the PR has been reviewed. You can assign the PR to them by writing /assign @sayan-biswas in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/tektoncd/results/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
tekton-robot commented 7 months ago

Hi @ramessesii2. Thanks for your PR.

I'm waiting for a tektoncd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
tekton-robot commented 7 months ago

The following is the coverage report on the affected files. Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/watcher/reconciler/dynamic/dynamic.go 69.3% 60.8% -8.5
tekton-robot commented 7 months ago

The following is the coverage report on the affected files. Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/watcher/reconciler/dynamic/dynamic.go 69.3% 60.2% -9.1
tekton-robot commented 7 months ago

The following is the coverage report on the affected files. Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/watcher/reconciler/dynamic/dynamic.go 69.3% 61.0% -8.4
tekton-robot commented 7 months ago

@ramessesii2: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-tekton-results-build-tests 52aafe71d0b8659f226c5eb4d4b21e8f88dc1728 link true /test pull-tekton-results-build-tests
pull-tekton-results-integration-tests 52aafe71d0b8659f226c5eb4d4b21e8f88dc1728 link true /test pull-tekton-results-integration-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
ramessesii2 commented 7 months ago

hi @gabemontero @adambkaplan with finalizers, at least locally, I've been able to fix the race condition. There's a small caveat with finalizer. Bcs we add finalizer for the PR to not get pruned until the streaming/sending of logs is done. Finalizer along with the PipelineRun object is stored as well. While that might not be a deal breaker (I'm not sure though), but I find it simpler to utilize this field from LogStatus to simply hold on to pruning until we find isStored: true.

gabemontero commented 7 months ago

hi @gabemontero @adambkaplan with finalizers, at least locally, I've been able to fix the race condition. There's a small caveat with finalizer. Bcs we add finalizer for the PR to not get pruned until the streaming/sending of logs is done. Finalizer along with the PipelineRun object is stored as well. While that might not be a deal breaker (I'm not sure though), but I find it simpler to utilize this field from LogStatus to simply hold on to pruning until we find isStored: true.

for me at least I like using IsStored() instead as well @ramessesii2 @adambkaplan @ramessesii2 @sayan-biswas assuming the watcher reconciler that handles pruning can watch for that and requeue if it is not yet stored

I believe that is the case based on what I recall from the fix for handling cancelled pipeline/task runs

putting metadata i.e. in the object we are storing seems more fragile to me in hindsight

what do you all think?

perhaps as part of breaking out the mem leak fix from this one, either create a separate PR or a separate commit in this PR so we can compare IsStored vs. finalizer

ramessesii2 commented 7 months ago

FYI : #713 uses Logs API to address race condition

tekton-robot commented 7 months ago

@ramessesii2: PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.