netobserv / netobserv-ebpf-agent

Network Observability eBPF Agent
Apache License 2.0
116 stars 29 forks source link

Fix issue with inconsistent GenericMap output #297

Closed jotak closed 3 months ago

jotak commented 3 months ago

Description

The FLP GenericMap provided by the agent differs in types depending on the exporter being used, especially with DirectFLP each records size is smaller than when used with protobuf, due to protobuf using uint32 almost everywhere. This is not a problem in most cases as FLP generally converts that in JSON, but it breaks the FLP IPFIX export which expects strict types.

This commit brings more consistency between the FLP-direct mode and the protobuf mode, by sharing the same GenericMap encoding function. Which means the protobuf decode is now done in two steps: first proto-to-flow.Record, then flow.Record-to-GenericMap.

Dependencies

FLP update of the IPFIX exporter: https://github.com/netobserv/flowlogs-pipeline/pull/634

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

github-actions[bot] commented 3 months ago

New image: quay.io/netobserv/netobserv-ebpf-agent:e1397a0

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=e1397a0 make set-agent-image
codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 95.38462% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 33.17%. Comparing base (58d01d9) to head (dffe40a). Report is 1 commits behind head on main.

Files Patch % Lines
pkg/decode/decode_protobuf.go 94.82% 2 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #297 +/- ## ========================================== - Coverage 36.74% 33.17% -3.57% ========================================== Files 42 40 -2 Lines 3813 3587 -226 ========================================== - Hits 1401 1190 -211 + Misses 2334 2322 -12 + Partials 78 75 -3 ``` | [Flag](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/297/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/netobserv/netobserv-ebpf-agent/pull/297/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv) | `33.17% <95.38%> (-3.57%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=netobserv#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

msherif1234 commented 3 months ago

any cpu impact for the additional processing with ur scale test ?

msherif1234 commented 3 months ago

why this is marked as breaking change ?

jotak commented 3 months ago

why this is marked as breaking change ?

That's a breaking change because the types in the GenericMap output change, as you can see here: https://github.com/netobserv/netobserv-ebpf-agent/pull/297/files#diff-5b98b198dc9b743f2c1c2d470d443d9ec2ed38b4ee15777cb857584ddacce657 For instance "Proto" was u8 in the inner model, cast as uint32 in protobuf, previously with the grpc exporter it was still uint32 in the GenericMap but now it's back to uint8 in that Map.

So, consumers who would cast explicitly to uint32 would now have to cast to uint8

msherif1234 commented 3 months ago

/lgtm

jotak commented 3 months ago

any cpu impact for the additional processing with ur scale test ?

Slight increase of CPU in one of the tests (0.167 -> 0.172) and memory as well (671MB -> 679 MB) (i picked the one run that seems more relevant as it showed a similar number of flows per seconds). I'd say it's in reasonable bounds...

jotak commented 3 months ago

/approve marking no-qe as there's no visible output

openshift-ci[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jotak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/netobserv/netobserv-ebpf-agent/blob/main/OWNERS)~~ [jotak] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jotak commented 3 months ago

/retest-required

jotak commented 3 months ago

/retest