pouriyajamshidi / flat

Measure UDP and TCP connection latency for IPv4 and IPv6 using eBPF and Go
GNU General Public License v3.0
20 stars 3 forks source link

IPv4 padding becomes random after a few packets #30

Closed cha0sCat closed 4 months ago

cha0sCat commented 4 months ago

When using the original code, after the program outputs about 150 lines of connection logs at the beginning, there is no more output.

Snipaste_2024-04-30_10-39-31

So, I add some logs to it https://github.com/cha0sCat/flat/blob/a0010830e35163d96aa2106a12003ae67538b656/internal/probe/probe.go#L233 https://github.com/cha0sCat/flat/blob/a0010830e35163d96aa2106a12003ae67538b656/internal/probe/probe.go#L242

Then I found that the IP prefix becomes random after a few seconds of the program running. (This is an ipv4 connection actually)

2024/04/30 01:59:26 Received packet: [206 236 2 144 90 88 2 0 48 0 255 255 10 10 32 54 103 226 48 129 90 88 2 0 48 0 255 255 10 10 32 37 3 39 126 37 6 64 2 0 203 96 209 159 90 88 2 0]
2024/04/30 01:59:26 Received packetAttrs: {SrcIP:ceec:290:5a58:200:3000:ffff:a0a:2036 DstIP:67e2:3081:5a58:200:3000:ffff:a0a:2025 SrcPort:807 DstPort:32293 Protocol:6 TTL:64 Syn:false Ack:false TimeStamp:660096205021387}
2024/04/30 01:59:26 Received packet: [164 1 3 144 90 88 2 0 48 0 255 255 10 10 34 92 110 59 49 129 90 88 2 0 48 0 255 255 10 10 32 37 56 224 122 209 6 64 2 0 109 101 209 159 90 88 2 0]
2024/04/30 01:59:26 Received packetAttrs: {SrcIP:a401:390:5a58:200:3000:ffff:a0a:225c DstIP:6e3b:3181:5a58:200:3000:ffff:a0a:2025 SrcPort:14560 DstPort:31441 Protocol:6 TTL:64 Syn:false Ack:false TimeStamp:660096205022573}

please notice that we can still find the actual ipv4 address in the raw packet: Snipaste_2024-04-30_11-00-51

comparing to normal packets:

So it turns out there are no more logs because these packets have unique random IP so they can't match with others.

The root cause is that in6_addr is not initialized before use, so it may be a random value

cha0sCat commented 4 months ago

PR sent #31

I'm very new to C so please bear with me if the code looks bad :P

pouriyajamshidi commented 4 months ago

Hey @cha0sCat,

Thanks for debugging this and the detailed explanation.

Please give me some time to review your PR.

pouriyajamshidi commented 4 months ago

@cha0sCat, I am not able to reproduce this. I use both IPv4 and IPv6, running the program for hours but the IPv4 never starts with anything but SrcIP:::ffff:x.x.x.x.

  1. Are you using the latest commit?
  2. What kernel version are you using?
cha0sCat commented 4 months ago

For C, reading from uninitialized var is an undefined behaviour, so the behaviour might be various on different OS and compilers. (https://www.learncpp.com/cpp-tutorial/uninitialized-variables-and-undefined-behavior/)

  1. Yes, I'm using the latest commit
  2. For building, I use GitHub actions with the ubuntu-latest image; I'm not sure what the exact kernel version is. Here's the binary file: https://github.com/cha0sCat/flat/actions/runs/8887259908 (you can also use this compiled version to see if it can reproduce)

For running I use Linux HOSTNAMEHERE 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64 x86_64 x86_64 GNU/Linux with ubuntu 20.04

Some additional context:

  1. I run flat on a heavy network load VM with about 200MByte/s in & out. (So it might be reusing the same block of memory very frequently)
  2. To build on ubuntu-latest and run on older ubuntu-20.04 without any glibc version issue, I mod the build command to make a fully static build (CGO=0 and -static). For the full compile step, please check the workflow file: https://github.com/cha0sCat/flat/actions/runs/8887259908/workflow
pouriyajamshidi commented 4 months ago

Nice explanation.

I have a workaround for this and think it is a good time for you and I to test this together.

I will make some changes to the code, create a static release and then let you know. Really interested to see how much of a difference it makes.

pouriyajamshidi commented 4 months ago

@cha0sCat so, I added a static release here:

https://github.com/pouriyajamshidi/flat/releases/download/v0.2.0/flat

It is based on a change here:

https://github.com/pouriyajamshidi/flat/blob/c6f34aedff6bc77259524d51ab86681730f3209b/bpf/flat.c#L146

Please give it a shot and let me know if it solves it.

cha0sCat commented 4 months ago

No, the issue still exists; I'm using the exact same VM as the first time.

Also, I want to correct my previous report. This issue happens not a few seconds after running but almost less than one second.

output.webm

pouriyajamshidi commented 4 months ago

ah, that is a shame. I was hoping that the NULL assignment would fix it.

I understand that your PR fixes it but I was hoping to fix it more fundamentally (ensuring other packet_t fields are not affected as well) since we have a good test case here.

Could you please also give the latest release one more shot? The link:

https://github.com/pouriyajamshidi/flat/releases/tag/v0.2.1

The change:

https://github.com/pouriyajamshidi/flat/blob/9f12e8484452c7643cff0b82a6acea3563690584/bpf/flat.c#L155

Once again, thanks for the elaborate testing and level of details!

cha0sCat commented 4 months ago

Works great this time. It's a pleasure to cooperate with you. The tool is also impressive: lightweight, good for latency spiking diagnostic, super feat our use cases

output.webm

pouriyajamshidi commented 4 months ago

WOW! you got some traffic on that machine :D

Thanks a lot for your report, debugging and suggestions mate. Was very fun!

Btw, if you want, you can modify your PR to include the changes in the branch that fixed this for you. It might sounds a bit silly but since you did a lot of work on this, I think it'd be nice to have your name as a contributor of this project.

cha0sCat commented 4 months ago

Yes, I've cherrypicked your commit and cleaned up mine.

I also noticed that the portable version already has a code for initialize: https://github.com/pouriyajamshidi/flat/blob/8dfba7e197e4474c8c61f6c7f8439686f094b09d/bpf/flat_portable.c#L142

Maybe we can make them same? but not sure memset and = {0} which performance better

pouriyajamshidi commented 4 months ago

The portable version uses the perfevent instead of ringbuf since ringbuf is not available on older Linux versions.

On the "non-portable" version we cannot zero-initialize pkt since ringbuf would not accept it :S

cha0sCat commented 4 months ago

😂 cool

pouriyajamshidi commented 4 months ago

fixed in #31