netobserv / netobserv-ebpf-agent

Network Observability eBPF Agent
Apache License 2.0
127 stars 32 forks source link

NETOBSERV-559: use LookupAndDelete to read maps #283

Closed jotak closed 6 months ago

jotak commented 7 months ago

Description

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

github-actions[bot] commented 7 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:c5b2f78

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=c5b2f78 make set-agent-image
jotak commented 7 months ago

my perf numbers look unsanely awesome I need to check with NDH https://docs.google.com/spreadsheets/d/1qakBaK1dk_rERO30k1cSR4W-Nn0SXW4A3lqQ1sZC4rE/edit#gid=1192055209 image

codecov[bot] commented 7 months ago

Codecov Report

Attention: Patch coverage is 5.26316% with 108 lines in your changes are missing coverage. Please review.

Project coverage is 36.32%. Comparing base (58d01d9) to head (da2f5e5).

Files Patch % Lines
pkg/ebpf/tracer.go 0.00% 69 Missing :warning:
pkg/ebpf/tracer_legacy.go 0.00% 36 Missing :warning:
pkg/utils/utils.go 66.66% 2 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #283 +/- ## ========================================== - Coverage 36.74% 36.32% -0.42% ========================================== Files 42 43 +1 Lines 3813 3879 +66 ========================================== + Hits 1401 1409 +8 - Misses 2334 2391 +57 - Partials 78 79 +1 ``` | [Flag](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/283/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/283/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | `36.32% <5.26%> (-0.42%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jotak commented 7 months ago

@msherif1234 we really really really need to look at this. The perfs are insane, or I should say, our current version is really wrong, much more than I thought, with LookupAndDelete. This PR must not be merged as is (flows could be missed between the lookup and the delete) but we need to find a solution to get this level of performance without affecting correctness

openshift-ci-robot commented 7 months ago

@jotak: This pull request references NETOBSERV-559 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/283): >This is for performance testing only. Flows might be dropped due to not locking map between lookup and delete > >## Description > > > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [ ] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 7 months ago

@jotak: This pull request references NETOBSERV-559 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/283): >## Description > >- 2-passes LookupAndDelete: first to get ids, then for atomic Lookup+Delete. >- Keep legacy version for old kernels (<4.20) >- The terrible performance of the previous implementation seems due to deleting while iterating > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [ ] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jotak commented 7 months ago

Another set of tests still shows much much improved performances: https://docs.google.com/spreadsheets/d/1qakBaK1dk_rERO30k1cSR4W-Nn0SXW4A3lqQ1sZC4rE/edit#gid=807334756

image

msherif1234 commented 7 months ago

Thanks @jotak changes looks good to me pls run with large file with sampling of 1 to be certain we don't miss any flows /lgtm

msherif1234 commented 7 months ago

/LGTM

jotak commented 7 months ago

@msherif1234 I addressed the feedback:

Capture d’écran du 2024-03-04 16-26-19

Also I wanted to make sure that there was no dup id anymore in the map when iterating in LookupAndDelete, so this is confirmed with the new metrics where hasmap-unique equals hashmap-total Capture d’écran du 2024-03-04 16-27-10

msherif1234 commented 7 months ago

/lgtm

jotak commented 7 months ago

/approve

msherif1234 commented 7 months ago

/ok-to-test

github-actions[bot] commented 7 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:94cfb8c

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=94cfb8c make set-agent-image
openshift-ci[bot] commented 6 months ago

New changes are detected. LGTM label has been removed.

jotak commented 6 months ago

(rebased)

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:a0238fe

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=a0238fe make set-agent-image
jotak commented 6 months ago

/approve

openshift-ci[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/netobserv/netobserv-ebpf-agent/blob/main/OWNERS)~~ [jotak] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
nathan-weinberg commented 6 months ago

Note this was merged despite regression test failures without QE approval