netobserv / netobserv-ebpf-agent

Network Observability eBPF Agent
Apache License 2.0
119 stars 30 forks source link

NETOBSERV-1379: enhance DNS debugging to dbg DNS over TCP with NA fields #218

Closed msherif1234 closed 8 months ago

msherif1234 commented 9 months ago

Description

it was noticed with DNS over TCP that the handshake packets are too small and it doesn't contain any DNS data from UI DNS fields showed as n/a which is true

this PR propagate the DNS lookup return code so we can differentiate between cases like TCP handshake case and real errors enriching with DNS info.

when we get N/A bpf_skb_load_bytes() return an error

working
======
 "dns_record": {
                        "id": 40514,
                        "flags": 34176,
                        "latency": 185794,
                        "errno": 0,
                        "offset": 68,
                        "tcp_len": 34,
                        "skb_len": 291
                    },

none working
==========
 "dns_record": {
                        "id": 0,
                        "flags": 0,
                        "latency": 0,
                        "errno": 7,
                        "offset": 68,
                        "tcp_len": 34,
                        "skb_len": 66
                    },

in the none working case the received packet is very small that is why we can't fetch DNS header info.

Those small packets are TCP handshake and those packets don't have data by definition this clear when u look at the pcap

image (1)

Dependencies

https://github.com/netobserv/network-observability-console-plugin/pull/425 https://github.com/netobserv/flowlogs-pipeline/pull/533

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:0fd928e

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=0fd928e make set-agent-image
codecov[bot] commented 9 months ago

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (0891b34) 31.79% compared to head (d66c036) 31.84%. Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #218 +/- ## ========================================== + Coverage 31.79% 31.84% +0.05% ========================================== Files 37 37 Lines 3362 3369 +7 ========================================== + Hits 1069 1073 +4 - Misses 2230 2232 +2 - Partials 63 64 +1 ``` | [Flag](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/218/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/218/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | `31.84% <57.14%> (+0.05%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [pkg/decode/decode\_protobuf.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2RlY29kZS9kZWNvZGVfcHJvdG9idWYuZ28=) | `29.59% <100.00%> (+0.44%)` | :arrow_up: | | [pkg/exporter/proto.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2V4cG9ydGVyL3Byb3RvLmdv) | `95.27% <100.00%> (+0.07%)` | :arrow_up: | | [pkg/flow/record.go](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/218?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#diff-cGtnL2Zsb3cvcmVjb3JkLmdv) | `55.22% <0.00%> (-2.59%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:a7e6749

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=a7e6749 make set-agent-image
msherif1234 commented 9 months ago

/ok-to-test

msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:ed3a6a6

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=ed3a6a6 make set-agent-image
msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:35e2cc7

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=35e2cc7 make set-agent-image
msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:075fb69

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=075fb69 make set-agent-image
msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:e32f956

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=e32f956 make set-agent-image
openshift-ci-robot commented 9 months ago

@msherif1234: This pull request references NETOBSERV-1379 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.15.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/218): >## Description > > > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [ ] Does this PR require product documentation? > * [ ] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [ ] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [ ] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 9 months ago

@msherif1234: This pull request references NETOBSERV-1379 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.15.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/218): >## Description > >it was noticed with DNS over TCP that the handshake packets are too small and it doesn't contain any DNS data from UI DNS fields showed as `n/a` which is true > >this PR propagate the DNS lookup return code so we can differentiate between cases like TCP handshake case and real errors enriching with DNS info. > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [x] Does this PR require product documentation? > * [x] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [x] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [x] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
openshift-ci-robot commented 9 months ago

@msherif1234: This pull request references NETOBSERV-1379 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.15.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/218): >## Description > >it was noticed with DNS over TCP that the handshake packets are too small and it doesn't contain any DNS data from UI DNS fields showed as `n/a` which is true > >this PR propagate the DNS lookup return code so we can differentiate between cases like TCP handshake case and real errors enriching with DNS info. > >when we get N/A `bpf_skb_load_bytes()` return an error >```golang >working >====== > "dns_record": { > "id": 40514, > "flags": 34176, > "latency": 185794, > "errno": 0, > "offset": 68, > "tcp_len": 34, > "skb_len": 291 > }, > >none working >========== > "dns_record": { > "id": 0, > "flags": 0, > "latency": 0, > "errno": 7, > "offset": 68, > "tcp_len": 34, > "skb_len": 66 > }, > >``` >in the none working case the received packet is very small that is why we can't fetch DNS header info. > >Those small packets are TCP handshake and those packets don't have data by definition this clear when u look at the pcap > >![image (1)](https://github.com/netobserv/netobserv-ebpf-agent/assets/12748167/866f0452-b44d-4a6f-a688-671db15ccd9b) > >## Dependencies > > >n/a > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [x] Does this PR require product documentation? > * [x] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [x] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [x] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jpinsonneau commented 9 months ago

@msherif1234 are these error codes we should expect ? https://elixir.bootlin.com/linux/v4.7/source/include/uapi/asm-generic/errno-base.h#L10

msherif1234 commented 9 months ago

@msherif1234 are these error codes we should expect ? https://elixir.bootlin.com/linux/v4.7/source/include/uapi/asm-generic/errno-base.h#L10

Correct

openshift-ci-robot commented 9 months ago

@msherif1234: This pull request references NETOBSERV-1379 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.15.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/218): >## Description > >it was noticed with DNS over TCP that the handshake packets are too small and it doesn't contain any DNS data from UI DNS fields showed as `n/a` which is true > >this PR propagate the DNS lookup return code so we can differentiate between cases like TCP handshake case and real errors enriching with DNS info. > >when we get N/A `bpf_skb_load_bytes()` return an error >```golang >working >====== > "dns_record": { > "id": 40514, > "flags": 34176, > "latency": 185794, > "errno": 0, > "offset": 68, > "tcp_len": 34, > "skb_len": 291 > }, > >none working >========== > "dns_record": { > "id": 0, > "flags": 0, > "latency": 0, > "errno": 7, > "offset": 68, > "tcp_len": 34, > "skb_len": 66 > }, > >``` >in the none working case the received packet is very small that is why we can't fetch DNS header info. > >Those small packets are TCP handshake and those packets don't have data by definition this clear when u look at the pcap > >![image (1)](https://github.com/netobserv/netobserv-ebpf-agent/assets/12748167/866f0452-b44d-4a6f-a688-671db15ccd9b) > >## Dependencies > >https://github.com/netobserv/network-observability-console-plugin/pull/425 > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [x] Does this PR require product documentation? > * [x] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [x] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [x] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:054faa4

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=054faa4 make set-agent-image
msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:138178c

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=138178c make set-agent-image
msherif1234 commented 9 months ago

/ok-to-test

github-actions[bot] commented 9 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:9bed2e9

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=9bed2e9 make set-agent-image
openshift-ci-robot commented 9 months ago

@msherif1234: This pull request references NETOBSERV-1379 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.15.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/218): >## Description > >it was noticed with DNS over TCP that the handshake packets are too small and it doesn't contain any DNS data from UI DNS fields showed as `n/a` which is true > >this PR propagate the DNS lookup return code so we can differentiate between cases like TCP handshake case and real errors enriching with DNS info. > >when we get N/A `bpf_skb_load_bytes()` return an error >```golang >working >====== > "dns_record": { > "id": 40514, > "flags": 34176, > "latency": 185794, > "errno": 0, > "offset": 68, > "tcp_len": 34, > "skb_len": 291 > }, > >none working >========== > "dns_record": { > "id": 0, > "flags": 0, > "latency": 0, > "errno": 7, > "offset": 68, > "tcp_len": 34, > "skb_len": 66 > }, > >``` >in the none working case the received packet is very small that is why we can't fetch DNS header info. > >Those small packets are TCP handshake and those packets don't have data by definition this clear when u look at the pcap > >![image (1)](https://github.com/netobserv/netobserv-ebpf-agent/assets/12748167/866f0452-b44d-4a6f-a688-671db15ccd9b) > >## Dependencies > >https://github.com/netobserv/network-observability-console-plugin/pull/425 >https://github.com/netobserv/flowlogs-pipeline/pull/533 > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [x] Does this PR require product documentation? > * [x] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [x] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [x] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
Amoghrd commented 8 months ago

/label qe-approved

openshift-ci-robot commented 8 months ago

@msherif1234: This pull request references NETOBSERV-1379 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.15.0" version, but no target version was set.

In response to [this](https://github.com/netobserv/netobserv-ebpf-agent/pull/218): >## Description > >it was noticed with DNS over TCP that the handshake packets are too small and it doesn't contain any DNS data from UI DNS fields showed as `n/a` which is true > >this PR propagate the DNS lookup return code so we can differentiate between cases like TCP handshake case and real errors enriching with DNS info. > >when we get N/A `bpf_skb_load_bytes()` return an error >```golang >working >====== > "dns_record": { > "id": 40514, > "flags": 34176, > "latency": 185794, > "errno": 0, > "offset": 68, > "tcp_len": 34, > "skb_len": 291 > }, > >none working >========== > "dns_record": { > "id": 0, > "flags": 0, > "latency": 0, > "errno": 7, > "offset": 68, > "tcp_len": 34, > "skb_len": 66 > }, > >``` >in the none working case the received packet is very small that is why we can't fetch DNS header info. > >Those small packets are TCP handshake and those packets don't have data by definition this clear when u look at the pcap > >![image (1)](https://github.com/netobserv/netobserv-ebpf-agent/assets/12748167/866f0452-b44d-4a6f-a688-671db15ccd9b) > >## Dependencies > >https://github.com/netobserv/network-observability-console-plugin/pull/425 >https://github.com/netobserv/flowlogs-pipeline/pull/533 > >## Checklist > >If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that. > >* [ ] Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist. >* [ ] Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix _(in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes)._ >* [x] Does this PR require product documentation? > * [x] If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs. >* [x] Does this PR require a product release notes entry? > * [ ] If so, fill in "Release Note Text" in the JIRA. >* [ ] Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc. > * [ ] If so, make sure it is described in the JIRA ticket. >* QE requirements (check 1 from the list): > * [x] Standard QE validation, with pre-merge tests unless stated otherwise. > * [ ] Regression tests only (e.g. refactoring with no user-facing change). > * [ ] No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team). > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
msherif1234 commented 8 months ago

/approve

openshift-ci[bot] commented 8 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/netobserv/netobserv-ebpf-agent/blob/main/OWNERS)~~ [msherif1234] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment