netobserv / flowlogs-pipeline

Transform flow logs into metrics
Apache License 2.0
77 stars 24 forks source link

NETOBSERV-1248 deduper using infinispan cache #639

Closed jpinsonneau closed 1 month ago

jpinsonneau commented 8 months ago

Description

Implements infinispan shared cache:

Merged flows contains a list of interfaces / ifdirections with a * next to names coming from Egress flow: Screenshot from 2024-03-18 20-19-43 Screenshot from 2024-03-18 20-19-55

Steps to use this code:

These will provide a service cache.netobserv.svc.cluster.local on port 11222.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

openshift-ci[bot] commented 8 months ago

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] commented 8 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from jpinsonneau. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/netobserv/flowlogs-pipeline/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
jotak commented 8 months ago

The new version is better but still increases the overall resource usage in my tests, although it's decrease for Loki & storage: https://docs.google.com/spreadsheets/d/1qakBaK1dk_rERO30k1cSR4W-Nn0SXW4A3lqQ1sZC4rE/edit#gid=705512397

(By overall I mean: everything deployed in netobserv* namespaces - so that includes the infinispan deployment and loki)

CPU overall +146% (1.68) Memory overall +30% (3GB)

Loki CPU -44% (0.035) Loki memory -46% (652MB) Loki storage -53%

So the benefit here is really on storage, and comes at a price in CPU and memory. We could run tests at a bigger scale to check if it's different.

Also one thing that seems wrong from my tests is that the captured traffic has dropped quite a lot, as if we were removing more flows than expected. That's to investigate.

jpinsonneau commented 7 months ago

Rebased this PR and added CacheEndpoint config to condition cache usage

openshift-merge-robot commented 7 months ago

PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jpinsonneau commented 3 months ago

@jotak is this something we should still consider ?

I can replace the json Marshal / Unmarshal by fmt.Sprintf to improve FLP CPU usage but we will still have the infinispan resource impact :thinking: If you think it's not worth I will just close that PR.