netsampler / goflow2

High performance sFlow/IPFIX/NetFlow Collector
BSD 3-Clause "New" or "Revised" License
487 stars 112 forks source link

Checking if there any missing flows to be captured by the collector #259

Closed GeorgeMaged closed 7 months ago

GeorgeMaged commented 11 months ago

Hello there, I am using Goflow2 right now for PoC and also I have a netflow collector that already exists in my environment. Actually, I have a doubt or in other words how can I make sure that Goflow collector actually captures all the flows exported from the collector? How can I check if there any missing flows? When I check for example the number of flows in a time window of 5 minutes in Clickhouse, actually I find a difference when aggregating the total number of flows received between Goflow and the collector I have in my environment. So from where we can start to investigate more in that case cause I believe that the flows are the same which are exported from the router ( flow exporter) so what are the diffrences/configurations that can be configured from the collector side?

Also, To have more accurate results, the CacheActiveTout in the exporter is configured to be 300 seconds to avoid cache saturation and loss of data, so is there a corresponding conf. from the collector side(goflow) to be done ?

lspgn commented 11 months ago

Hello, Thank you for evaluating GoFlow2. Please specify which version you are using. There is a bug in v2.0.0 but fixed in main. You can check the number of flows received using the Prometheus endpoint. How many flows per second are you emitting? I'm missing a lot of information on your environment to be able to provide guidance (hardware, software used...). Have a look at the performance doc too.

The only thing that refers to CacheActiveTout is a Cisco documentation but cannot find a description of what this does.

GeorgeMaged commented 10 months ago

Actually I am using now goflow v2, when running the containers the goflow2 can't start as it cant find format=bin and its logs are :

2023-12-27 08:59:49 time="2023-12-27T06:59:49Z" level=fatal msg="Format bin not found" 2023-12-27T06:59:50.792073374Z time="2023-12-27T06:59:50Z" level=fatal msg="Format bin not found" 2023-12-27T06:59:52.514715087Z time="2023-12-27T06:59:52Z" level=fatal msg="Format bin not found" 2023-12-27T06:59:54.731755935Z time="2023-12-27T06:59:54Z" level=fatal msg="Format bin not found" 2023-12-27T06:59:56.286318676Z time="2023-12-27T06:59:56Z" level=fatal msg="Format bin not found" 2023-12-27T06:59:58.926916197Z time="2023-12-27T06:59:58Z" level=fatal msg="Format bin not found" 2023-12-27T07:00:02.831912213Z time="2023-12-27T07:00:02Z" level=fatal msg="Format bin not found"

While When I comment this line in the docker-compose file , it starts but have a problem in consuming the flows topic in the clickhouse.

lspgn commented 10 months ago

Can you give me the whole command line and more information about your environment (docker versions, etc).

GeorgeMaged commented 10 months ago
Docker Version:
Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.6
 API version:       1.43
 Go version:        go1.20.7
 Git commit:        ed223bc
 Built:             Mon Sep  4 12:32:48 2023
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.24.0 (122432)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:32:16 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

The cmd in the goflow2 container :

   goflow2:
    build:
      context: ../../
      dockerfile: Dockerfile
      args:
        VERSION: compose
        LDFLAGS: -X main.version=compose
    image: netsampler/goflow2
    depends_on:
      - kafka
    ports:
      - 8080:8080
      - 6343:6343/udp
      - 7779:2055/udp
    restart: always
    command: 
    - -transport.kafka.brokers=172.29.80.1:6011
    - -transport=kafka
    - -transport.kafka.topic=flows
    - -format=bin
lspgn commented 10 months ago

Can you run docker images?

Then make sure you have the latest version of netsampler/goflow2 by doing docker pull netsampler/goflow2 or adding :e4a14c2 after the image name in docker-compose.yml:

image: netsampler/goflow2:e4a14c2
GeorgeMaged commented 10 months ago

It solved the issue !! But there is one last problem regarding the format. Using a dummy netflow generator for testing purposes and here are the format of the flows inside the clickhouse:

image

and here are the format inside the flows topic of Kafka :

image

What am I missing? I am using the ' Protobuf ' format in create.sh of clickhouse

lspgn commented 10 months ago

For ClickHouse: some data is stored as raw binary for performance. I don't know dbweaver but using direct SQL, the IP addresses in a text form can be obtained using functions like the following:

SELECT if(etype = 0x800, IPv4NumToString(reinterpretAsUInt32(substring(reverse(src_addr), 13,4))), IPv6NumToString(src_addr)) as srcip

I would also recommend to avoid passing queries to flows since it's a "transition" table before flows_raw. The data that is queried will be removed and not inserted.

For Kafka: the data is serialized as protobuf which is a binary format. You need other tools to explore this.

This won't be compatible with ClickHouse without modifications, but if you change -format=json or -format=text instead of -format=bin, you'll be able to have a human readable message inside Kafka, albeit a lower performance due to encoding and more traffic.

GeorgeMaged commented 10 months ago

Yes It is readable but my issue that when I choose -format=json or -format=text , Nothing is consumed in the clickhouse and the db is empty.

lspgn commented 10 months ago

Yes: it's on purpose to optimize query performance and storage. Other behaviors won't be supported.

If you wish to use JSON, you need to edit the various flows_* tables from a Protobuf format into JSON and make sure the names are correctly mapped from the JSON payload into the columns. This will also break the Grafana dashboards shipping within the compose.

lspgn commented 7 months ago

Closing as stale