netobserv / netobserv-ebpf-agent

Network Observability eBPF Agent
Apache License 2.0
115 stars 29 forks source link

NETOBSERV-1697: Add retry around netlinkSubscribeAt #358

Closed msherif1234 closed 1 week ago

msherif1234 commented 1 week ago

Description

while netns is getting it was noticed the associated netnsHandle keep changing for sometime then become stable and at that point netlinkSubscribeAt will succeed

so added retry loop and avoid the early creation of netnshandle till things become more stable

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

openshift-ci-robot commented 1 week ago

@msherif1234: This pull request references NETOBSERV-1697 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/358): >## Description > > >## Dependencies > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [ ] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 43.47826% with 52 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@fdebe3f). Learn more about missing BASE report.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #358 +/- ## ======================================= Coverage ? 33.38% ======================================= Files ? 48 Lines ? 3531 Branches ? 0 ======================================= Hits ? 1179 Misses ? 2251 Partials ? 101 ``` | [Flag](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | `33.38% <43.47%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [pkg/ifaces/informer.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358?src=pr&el=tree&filepath=pkg%2Fifaces%2Finformer.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2lmYWNlcy9pbmZvcm1lci5nbw==) | `0.00% <0.00%> (ø)` | | | [pkg/ifaces/poller.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358?src=pr&el=tree&filepath=pkg%2Fifaces%2Fpoller.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2lmYWNlcy9wb2xsZXIuZ28=) | `82.45% <67.74%> (ø)` | | | [pkg/ebpf/tracer.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358?src=pr&el=tree&filepath=pkg%2Febpf%2Ftracer.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2VicGYvdHJhY2VyLmdv) | `0.00% <0.00%> (ø)` | | | [pkg/ifaces/watcher.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/358?src=pr&el=tree&filepath=pkg%2Fifaces%2Fwatcher.go&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2lmYWNlcy93YXRjaGVyLmdv) | `61.29% <51.35%> (ø)` | |
openshift-ci-robot commented 1 week ago

@msherif1234: This pull request references NETOBSERV-1697 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/358): >## Description > >while netns is getting it was noticed the associated netnsHandle keep changing for sometime then become stable and at that point `netlinkSubscribeAt` will succeed > >so added retry loop and avoid the early creation of netnshandle till things become more stable >## Dependencies > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [ ] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
msherif1234 commented 1 week ago

/ok-to-test

github-actions[bot] commented 1 week ago

New image: quay.io/netobserv/netobserv-ebpf-agent:f437640

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=f437640 make set-agent-image
msherif1234 commented 1 week ago

/ok-to-test

github-actions[bot] commented 1 week ago

New image: quay.io/netobserv/netobserv-ebpf-agent:86853c3

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=86853c3 make set-agent-image
msherif1234 commented 1 week ago

/ok-to-test

github-actions[bot] commented 1 week ago

New image: quay.io/netobserv/netobserv-ebpf-agent:66f70cd

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=66f70cd make set-agent-image
openshift-ci[bot] commented 1 week ago

@anfredette: changing LGTM is restricted to collaborators

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/358#pullrequestreview-2142758887): >This seems to work with my suggested change for the Done channel used for linkSubscriberAt(). Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
openshift-ci[bot] commented 1 week ago

@anfredette: changing LGTM is restricted to collaborators

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/358#pullrequestreview-2142789581): >LGTM Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
msherif1234 commented 1 week ago

/ok-to-test

github-actions[bot] commented 1 week ago

New image: quay.io/netobserv/netobserv-ebpf-agent:ae956e1

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=ae956e1 make set-agent-image
jotak commented 1 week ago

Thanks! /lgtm

msherif1234 commented 1 week ago

/ok-to-test

github-actions[bot] commented 1 week ago

New image: quay.io/netobserv/netobserv-ebpf-agent:d8a6aed

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=d8a6aed make set-agent-image
memodi commented 1 week ago

/label qe-approved

openshift-ci-robot commented 1 week ago

@msherif1234: This pull request references NETOBSERV-1697 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/358): >## Description > >while netns is getting it was noticed the associated netnsHandle keep changing for sometime then become stable and at that point `netlinkSubscribeAt` will succeed > >so added retry loop and avoid the early creation of netnshandle till things become more stable >## Dependencies > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [ ] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
msherif1234 commented 1 week ago

/ok-to-test

github-actions[bot] commented 1 week ago

New image: quay.io/netobserv/netobserv-ebpf-agent:53fa344

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=53fa344 make set-agent-image
jotak commented 1 week ago

/lgtm

msherif1234 commented 1 week ago

/approve

openshift-ci[bot] commented 1 week ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/netobserv/netobserv-ebpf-agent/blob/main/OWNERS)~~ [msherif1234] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
memodi commented 1 week ago

@msherif1234 - new commits were added after completing the testing, are those additional fixes?