prometheus / procfs

procfs provides functions to retrieve system, kernel and process metrics from the pseudo-filesystem proc.
Apache License 2.0
755 stars 311 forks source link

fix: same TCP connection appears twice #631

Closed weidongkl closed 2 months ago

weidongkl commented 2 months ago

/proc/net/{TCP, TCP6, UDP, UDP6} are dynamic , and when we read these files, we should read them all at once. there will be data consistency issues if using line by line reading

fix: https://github.com/prometheus/procfs/issues/576

discordianfish commented 2 months ago

Makes sense but I'm still worried about the increased memory usage here. These can be significant sized.. @SuperQ wdyt?

weidongkl commented 2 months ago

net-tools used setvbuf _IOFBF to read tcp file at once too. data consistency should be more important than the increased memory usage

weidongkl commented 2 months ago

@SuperQ wdyt

discordianfish commented 2 months ago

Yeah I think we'll need to do this

dswarbrick commented 2 months ago

net-tools used setvbuf _IOFBF to read tcp file at once too. data consistency should be more important than the increased memory usage

net-tools does not read the entire file at once. It merely uses buffered IO, with a buffer of PAGE_SIZE:

FILE *proc_fopen(const char *name)
{
    static char *buffer;
    static size_t pagesz;
    FILE *fd = fopen(name, "r");

    if (fd == NULL)
      return NULL;

    if (!buffer) {
      pagesz = getpagesize();
      buffer = malloc(pagesz);
    }   

    setvbuf(fd, buffer, _IOFBF, pagesz);
    return fd; 
}

I suspect that an equivalent approach in Go, which would have only required a single line change, would be to replace the previous io.LimitReader with a bufio.NewReaderSize(f, os.Getpagesize()).

SuperQ commented 2 months ago

Actually, this may be worse. os.ReadFile() attempts to open and read into a buffer the size of the file in one read. But only if it can stat the size of the file. Otherwise it reads 512 bytes at a time.

The proc files in question have no size.

$ stat /proc/net/tcp
  File: /proc/net/tcp
  Size: 0           Blocks: 0          IO Block: 1024   regular empty file
Device: 17h/23d Inode: 4026532077  Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-04-16 09:48:28.233648680 +0200
Modify: 2024-04-16 09:48:28.233648680 +0200
Change: 2024-04-16 09:48:28.233648680 +0200
 Birth: -

So this change likely does not fix the problem and may make it worse.

I think I'm going to revert this, without proof that this fix actually works, I think the original code is actually what we want.

dswarbrick commented 2 months ago

@SuperQ The os.Readfile approach indeed results in some unusual read syscall behaviour:

openat(AT_FDCWD, "/proc/net/tcp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_create1(EPOLL_CLOEXEC)            = 4
pipe2([5, 6], O_NONBLOCK|O_CLOEXEC)     = 0
epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN, data={u32=6294888, u64=6294888}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=513802241, u64=9100870932907425793}}) = 0
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "  sl  local_address rem_address "..., 512) = 512
read(3, "0000 00000000   101        0 167"..., 384) = 384
read(3, "   \n   5: 0601A8C0:9BBA 86C8F950"..., 512) = 512
read(3, "00000000 00000000  1000        0"..., 640) = 640
read(3, " 0000000000000000 52 4 26 7 7   "..., 1024) = 1024
read(3, "000  1000        0 128169 2 0000"..., 1024) = 1024
read(3, "00000000 01:00000062 00000004  1"..., 1280) = 1280
read(3, "1 -1                   \n  35: 06"..., 1536) = 1536
read(3, "C0:81D6 A7619125:5D91 01 0000000"..., 2560) = 2560
read(3, "B6F5A2:A5B4 01 00000009:00000000"..., 2816) = 2816
read(3, "           \n  81: 0601A8C0:8E98 "..., 4096) = 462
read(3, "", 3634)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000147bf4) = 0
close(3)

It appears to start with a 512-byte read, but gradually increases (apart from a strange decrease to 384 bytes for the second read).

Compare that with the original io.LimitReader approach:

openat(AT_FDCWD, "/proc/net/tcp", O_RDONLY|O_CLOEXEC) = 3
fcntl(3, F_GETFL)                       = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = 0
epoll_create1(EPOLL_CLOEXEC)            = 4
pipe2([5, 6], O_NONBLOCK|O_CLOEXEC)     = 0
epoll_ctl(4, EPOLL_CTL_ADD, 5, {events=EPOLLIN, data={u32=6299176, u64=6299176}}) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=3198156801, u64=8984441875505086465}}) = 0
read(3, "  sl  local_address rem_address "..., 4096) = 4050
read(3, "  26: 0601A8C0:B45E 86A4E9AA:C8D"..., 4096) = 4050
read(3, "  53: 0601A8C0:A0D6 2EB6F5A2:A5B"..., 4096) = 3300
read(3, "", 4096)                       = 0
epoll_ctl(4, EPOLL_CTL_DEL, 3, 0xc000147c04) = 0
close(3)

Incidentally, my suggested bufio.Reader approach looks identical to the io.LimitReader strace, which is no surprise I guess, if they're both using the same underlying io.Reader.

dswarbrick commented 2 months ago

Also for comparison, the (deprecated) netstat command:

openat(AT_FDCWD, "/proc/net/tcp", O_RDONLY) = 3
read(3, "  sl  local_address rem_address "..., 4096) = 4050
read(3, "  26: 0601A8C0:8938 8416AB51:05A"..., 4096) = 4050
read(3, "  53: 0601A8C0:8102 182F853E:589"..., 4096) = 4050
read(3, "  80: 0601A8C0:B44A B6126749:D98"..., 4096) = 4050
read(3, " 107: 0601A8C0:8890 6D39D766:DE9"..., 4096) = 4050
read(3, " 134: 0601A8C0:A3A4 9630C658:1AE"..., 4096) = 300
read(3, "", 4096)                       = 0
close(3)

So if anything, the original code most closely resembles how net-tools reads /proc/net/tcp.

Now, if both prometheus/procfs and net-tools produce inconsistent / duplicated entries, then I would say that it's time to use rtnetlink to get this information, as iproute2 does with the ss (socket statistics) tool.

SuperQ commented 2 months ago

We already switched to netlink in the node_exporter tcpstat collector.

dswarbrick commented 2 months ago

Exactly - so should these legacy functions that parse /proc/net/tcp{,6} be marked as deprecated to discourage people from using them?

SuperQ commented 2 months ago

Yea, maybe not a bad idea. @discordianfish What do you think of marking this reader/parser as deprecated?

discordianfish commented 2 months ago

@SuperQ yeah that makes sense