netobserv / netobserv-ebpf-agent

Network Observability eBPF Agent
Apache License 2.0
119 stars 30 forks source link

NETOBSERV-1390: Flow RTT values are n/a #245

Closed msherif1234 closed 6 months ago

msherif1234 commented 6 months ago

Description

using tcp_rcv fentry hook to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it

Testing

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv kprobe to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
codecov[bot] commented 6 months ago

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (cd878fd) 33.62% compared to head (0095be4) 33.86%.

Files Patch % Lines
pkg/ebpf/tracer.go 0.00% 16 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #245 +/- ## ========================================== + Coverage 33.62% 33.86% +0.23% ========================================== Files 39 39 Lines 3494 3479 -15 ========================================== + Hits 1175 1178 +3 + Misses 2251 2233 -18 Partials 68 68 ``` | [Flag](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/245/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/245/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | `33.86% <20.00%> (+0.23%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

msherif1234 commented 6 months ago

main vs this PR both with RTT enabled, there is cpu increase because of kprobe processing

image

but its looking much better from netobserv UI as long as we filter on TCP and Ingress direction because this Ingress only feature, also w/o the need to set sampling to 1 in fact with high sampling we might not give chance for the new hook to run as often and lose rtt updates

image

openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv kprobe and fentry hooks to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it > >## Testing >* [X] verified changes with 4.16 image (RHEL9) >* [X] verified changes with 4.12 image (RHEL8) >* [ ] run on different arch than `amd64` > using cluster-bot `launch 4.16 aws,arm64` >* [ ] performance and scale run >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
msherif1234 commented 6 months ago

/ok-to-test

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:1ebae82

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=1ebae82 make set-agent-image
msherif1234 commented 6 months ago

/ok-to-test

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:9daf310

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=9daf310 make set-agent-image
openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv kprobe and fentry hooks to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it > >## Testing >* [X] verified changes with 4.16 image (RHEL9) >* [X] verified changes with 4.12 image (RHEL8) >* [ ] run on different arch than `amd64` > using cluster-bot `launch 4.16 aws,arm64` >* [X] performance and scale run >https://docs.google.com/spreadsheets/d/1BHoBJb8Pg-SI2bVNG58_NDoIhfOLV44hTysR2zalJAY/edit#gid=1230208449 >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
msherif1234 commented 6 months ago

/ok-to-test

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:b929c8f

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=b929c8f make set-agent-image
openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "netobserv-1.5" instead.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv kprobe and fentry hooks to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it > >## Testing >* [X] verified changes with 4.16 image (RHEL9) >* [X] verified changes with 4.12 image (RHEL8) >* [ ] run on different arch than `amd64` > using cluster-bot `launch 4.16 aws,arm64` >* [X] performance and scale run >- 4.15 https://docs.google.com/spreadsheets/d/1BHoBJb8Pg-SI2bVNG58_NDoIhfOLV44hTysR2zalJAY/edit#gid=1230208449 >## Dependencies >- 4.14 https://docs.google.com/spreadsheets/d/1sJhlmHVlEHx9ZwfOyz7tVi54c1vtrbr4cW3tdvVXfVk/edit#gid=1878726753 > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "netobserv-1.5" instead.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv kprobe and fentry hooks to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it > >## Testing >* [X] verified changes with 4.16 image (RHEL9) >* [X] verified changes with 4.12 image (RHEL8) >* [ ] run on different arch than `amd64` > using cluster-bot `launch 4.16 aws,arm64` >* [X] performance and scale run >- 4.15 https://docs.google.com/spreadsheets/d/1BHoBJb8Pg-SI2bVNG58_NDoIhfOLV44hTysR2zalJAY/edit#gid=1230208449 > >- 4.14 https://docs.google.com/spreadsheets/d/1sJhlmHVlEHx9ZwfOyz7tVi54c1vtrbr4cW3tdvVXfVk/edit#gid=1878726753 > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
jotak commented 6 months ago

Looks good to me overall - let's also see the perf tests not only for the RTT changes but also because of the libbpf upgrade

msherif1234 commented 6 months ago

Looks good to me overall - let's also see the perf tests not only for the RTT changes but also because of the libbpf upgrade

performance number are recorded in the PR description tl'dr the 1.4 numbers are in sync with latest automation number and reviewed by Nathan to sanity check the run config and the results but probably it will be good as well once I get LGTM to ask @nathan-weinberg for another run to be 100% sure because I don't want this to come up in the last minute as it usuall does :(

msherif1234 commented 6 months ago

/ok-to-test

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:fe0d62f

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=fe0d62f make set-agent-image
msherif1234 commented 6 months ago

/ok-to-test

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:bc220e5

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=bc220e5 make set-agent-image
jotak commented 6 months ago

/lgtm

Amoghrd commented 6 months ago

/label qe-approved

openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "netobserv-1.5" instead.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv kprobe and fentry hooks to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it > >## Testing >* [X] verified changes with 4.16 image (RHEL9) >* [X] verified changes with 4.12 image (RHEL8) >* [ ] run on different arch than `amd64` > using cluster-bot `launch 4.16 aws,arm64` >* [X] performance and scale run >- 4.15 https://docs.google.com/spreadsheets/d/1BHoBJb8Pg-SI2bVNG58_NDoIhfOLV44hTysR2zalJAY/edit#gid=1230208449 > >- 4.14 https://docs.google.com/spreadsheets/d/1sJhlmHVlEHx9ZwfOyz7tVi54c1vtrbr4cW3tdvVXfVk/edit#gid=1878726753 > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 6 months ago

@msherif1234: This pull request references NETOBSERV-1390 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target either version "4.16." or "openshift-4.16.", but it targets "netobserv-1.5" instead.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/245): >## Description > >using tcp_rcv fentry hook to reuse kernel calculated rtt directly from TCP socket instead of trying to calculate it > >## Testing >* [X] verified changes with 4.16 image (RHEL9) >* [X] verified changes with 4.12 image (RHEL8) >* [ ] run on different arch than `amd64` > using cluster-bot `launch 4.16 aws,arm64` >* [X] performance and scale run >- 4.15 https://docs.google.com/spreadsheets/d/1BHoBJb8Pg-SI2bVNG58_NDoIhfOLV44hTysR2zalJAY/edit#gid=1230208449 > >- 4.14 https://docs.google.com/spreadsheets/d/1sJhlmHVlEHx9ZwfOyz7tVi54c1vtrbr4cW3tdvVXfVk/edit#gid=1878726753 > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [X] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [X] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=netobserv%2Fnetobserv-ebpf-agent). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
msherif1234 commented 6 months ago

removed kprobe and just stayed with fentry only which more advanced kprobe and light weight also no need for additional complexity for no real reason

msherif1234 commented 6 months ago

/ok-to-test

github-actions[bot] commented 6 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:2639b26

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=2639b26 make set-agent-image
Amoghrd commented 6 months ago

Works as expected with latest commits.

jotak commented 6 months ago

/lgtm

msherif1234 commented 6 months ago

/approve

openshift-ci[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/netobserv/netobserv-ebpf-agent/blob/main/OWNERS)~~ [msherif1234] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment