Open kellyjonbrazil opened 4 months ago
While running our monitoring script which uses jc
to parse netstat
output on linux, it started throwing "index out of range" errors. I added some debug printing to jc/parsers/netsat_linux.py
to see what is going on.
First one was to see the data comming in
def parse_network(headers, entry):
LIST_OF_STATES = [
"ESTABLISHED", "SYN_SENT", "SYN_RECV", "FIN_WAIT1", "FIN_WAIT2",
"TIME_WAIT", "CLOSED", "CLOSE_WAIT", "LAST_ACK", "LISTEN", "CLOSING",
"UNKNOWN", "7"
]
# split entry based on presence of value in "State" column
contains_state = any(state in entry for state in LIST_OF_STATES)
split_modifier = 1 if contains_state else 2
### DEBUG PRINT ###
print(entry)
entry = entry.split(maxsplit=len(headers) - split_modifier)
output:
tcp 0 0 0.0.0.0:9898 0.0.0.0:* LISTEN 1178/pgpool
tcp 0 0 192.168.68.116:9102 0.0.0.0:* LISTEN 584/bareos-fd
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 600/sshd: /usr/sbin
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 1178/pgpool
tcp 0 0 0.0.0.0:5433 0.0.0.0:* LISTEN 1676/postgres
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 928/exim4
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 1817931/zabbix_agen
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 1187/pgpool: watchd
tcp6 0 0 :::9898 :::* LISTEN 1178/pgpool
tcp6 0 0 :::22 :::* LISTEN 600/sshd: /usr/sbin
tcp6 0 0 :::5432 :::* LISTEN 1178/pgpool
tcp6 0 0 :::5433 :::* LISTEN 1676/postgres
tcp6 0 0 :::10050 :::* LISTEN 1817931/zabbix_agen
udp 0 0 0.0.0.0:68 0.0.0.0:* 535/dhclient
udp 0 0 0.0.0.0:37300 0.0.0.0:* 1204/pgpool: heartb
udp 0 0 0.0.0.0:9694 0.0.0.0:* 1205/pgpool: heartb
udp 0 0 0.0.0.0:9694 0.0.0.0:* 1203/pgpool: heartb
udp 0 0 0.0.0.0:44649 0.0.0.0:* 1206/pgpool: heartb
udp 0 0 0.0.0.0:52868 0.0.0.0:* 494/rsyslogd
The interesting part comes here, It is clear that none of the udp
processes has value in the STATE
column, howerver after running the script with this debug print added, we can see the error:
def parse_network(headers, entry):
LIST_OF_STATES = [
"ESTABLISHED", "SYN_SENT", "SYN_RECV", "FIN_WAIT1", "FIN_WAIT2",
"TIME_WAIT", "CLOSED", "CLOSE_WAIT", "LAST_ACK", "LISTEN", "CLOSING",
"UNKNOWN", "7"
]
# split entry based on presence of value in "State" column
contains_state = any(state in entry for state in LIST_OF_STATES)
split_modifier = 1 if contains_state else 2
entry = entry.split(maxsplit=len(headers) - split_modifier)
### DEBUG PRINT ###
print(entry, len(entry))
['tcp', '0', '0', '0.0.0.0:9898', '0.0.0.0:*', 'LISTEN', '1178/pgpool '] 7
['tcp', '0', '0', '192.168.68.116:9102', '0.0.0.0:*', 'LISTEN', '584/bareos-fd '] 7
['tcp', '0', '0', '0.0.0.0:22', '0.0.0.0:*', 'LISTEN', '600/sshd: /usr/sbin '] 7
['tcp', '0', '0', '0.0.0.0:5432', '0.0.0.0:*', 'LISTEN', '1178/pgpool '] 7
['tcp', '0', '0', '0.0.0.0:5433', '0.0.0.0:*', 'LISTEN', '1676/postgres '] 7
['tcp', '0', '0', '127.0.0.1:25', '0.0.0.0:*', 'LISTEN', '928/exim4 '] 7
['tcp', '0', '0', '0.0.0.0:10050', '0.0.0.0:*', 'LISTEN', '1817931/zabbix_agen '] 7
['tcp', '0', '0', '0.0.0.0:9000', '0.0.0.0:*', 'LISTEN', '1187/pgpool: watchd '] 7
['tcp6', '0', '0', ':::9898', ':::*', 'LISTEN', '1178/pgpool '] 7
['tcp6', '0', '0', ':::22', ':::*', 'LISTEN', '600/sshd: /usr/sbin '] 7
['tcp6', '0', '0', ':::5432', ':::*', 'LISTEN', '1178/pgpool '] 7
['tcp6', '0', '0', ':::5433', ':::*', 'LISTEN', '1676/postgres '] 7
['tcp6', '0', '0', ':::10050', ':::*', 'LISTEN', '1817931/zabbix_agen '] 7
['udp', '0', '0', '0.0.0.0:68', '0.0.0.0:*', '535/dhclient '] 6
['udp', '0', '0', '0.0.0.0:37300', '0.0.0.0:*', '1204/pgpool:', 'heartb '] 7
['udp', '0', '0', '0.0.0.0:9694', '0.0.0.0:*', '1205/pgpool: heartb '] 6
['udp', '0', '0', '0.0.0.0:9694', '0.0.0.0:*', '1203/pgpool: heartb '] 6
['udp', '0', '0', '0.0.0.0:44649', '0.0.0.0:*', '1206/pgpool: heartb '] 6
['udp', '0', '0', '0.0.0.0:52868', '0.0.0.0:*', '494/rsyslogd '] 6
The row with the heartb
process gets parsed incorrectly, because it is running on port :37300
. The number 7, which is one of the recognized states after the update, gets picked up as delimetter and breaks the formatting of the row with 1204/pgpool:
ending up in the STATE column and heartb
in PID/Program name
The "index out of range error" I'm getting is because I take the output of jc and split the column PID/Program name on '/'
and access both parts of it.
In this corner case however, it breaks the script, because of the parsing error I described higher. (since there is no /
in the column)
I was localy able to fix the problem when I removed the number 7 from the LIST_OF_STATES. I remember tho, that without the 7 present in it, some tests of yours were failing.
What I would like to do, is to remove the number 7 from LIST_OF_STATES because according to the documentation it is not a valid state that can appear as output of the netstat command, and to try and debug why the tests are failing. I will update you on the progress. Is this approach okay with you?
Oh, dear - that's a nasty little bug. Thanks for looking into this - looking forward to working with you on this!
@HervisDaubeny any update on this? If not, I can take a look. Thanks!
Hello, unfortunately I'm currently flooded with work from every side. If you don't mind waiting I would love to look into it, but I can't promise you I will be able to do it soon. My estimate is that I could get to it in november.
Best regards, Sebastian
Message ID: @.***>
The current code for checking if an entry contains a specific state has a bug where partial matches can mistakenly trigger the condition. The current implementation:
contains_state = any(state in entry for state in LIST_OF_STATES)
does not enforce word boundaries, which means that it can match unintended portions of text, like matching "7" within "0.0.0.0:37300".
One potential quick fix to enforce word boundaries is to modify the code as follows:
contains_state = any(re.search(rf"\b{re.escape(state)}\b", entry) for state in LIST_OF_STATES)
What I would like to do, is to remove the number 7 from LIST_OF_STATES because according to the documentation it is not a valid state that can appear as output of the netstat command, and to try and debug why the tests are failing. I will update you on the progress. Is this approach okay with you?
It looks like the tests fail because we do see 7
as state output in some samples. For example this one on Ubuntu 18.0.4:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:42351 0.0.0.0:* LISTEN 1112/containerd
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 885/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1127/sshd
tcp6 0 0 :::22 :::* LISTEN 1127/sshd
udp 0 0 127.0.0.53:53 0.0.0.0:* 885/systemd-resolve
udp 0 0 192.168.71.131:68 0.0.0.0:* 867/systemd-network
raw6 0 0 :::58 :::* 7 867/systemd-network
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] SEQPACKET LISTENING 20812 1/init /run/udev/control
unix 2 [ ACC ] STREAM LISTENING 33765 1723/systemd /run/user/1000/systemd/private
unix 2 [ ACC ] STREAM LISTENING 33808 1723/systemd /run/user/1000/gnupg/S.gpg-agent.ssh
unix 2 [ ACC ] STREAM LISTENING 33809 1723/systemd /run/user/1000/gnupg/S.dirmngr
unix 2 [ ACC ] STREAM LISTENING 33810 1723/systemd /run/user/1000/gnupg/S.gpg-agent.browser
unix 2 [ ACC ] STREAM LISTENING 33811 1723/systemd /run/user/1000/gnupg/S.gpg-agent
unix 2 [ ACC ] STREAM LISTENING 33812 1723/systemd /run/user/1000/gnupg/S.gpg-agent.extra
unix 2 [ ACC ] STREAM LISTENING 20655 1/init /run/systemd/private
unix 2 [ ACC ] STREAM LISTENING 20662 1/init /run/lvm/lvmetad.socket
unix 2 [ ACC ] STREAM LISTENING 20664 1/init /run/systemd/journal/stdout
unix 2 [ ACC ] STREAM LISTENING 20891 1/init /run/lvm/lvmpolld.socket
unix 2 [ ACC ] STREAM LISTENING 27473 1/init /run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 27443 1/init /run/snapd.socket
unix 2 [ ACC ] STREAM LISTENING 27445 1/init /run/snapd-snap.socket
unix 2 [ ACC ] STREAM LISTENING 27475 1/init /run/uuidd/request
unix 2 [ ACC ] STREAM LISTENING 27481 1/init /var/run/docker.sock
unix 2 [ ACC ] STREAM LISTENING 27489 1/init /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 27468 1/init /var/lib/lxd/unix.socket
unix 2 [ ACC ] STREAM LISTENING 30726 1112/containerd /run/containerd/containerd.sock
unix 2 [ ACC ] STREAM LISTENING 27436 1/init @ISCSIADM_ABSTRACT_NAMESPACE
unix 2 [ ACC ] STREAM LISTENING 25548 607/VGAuthService /var/run/vmware/guestServicePipe
Looks like we'll have to tighten up the splitting logic per @Luigi31415 's comment above.
@kellyjonbrazil I think my suggested fix does solve the problem. Let me know if you agree so that I can put in a PR.
@Luigi31415 sounds good - go ahead and use dev
as a base. Thanks!
Merged in https://github.com/kellyjonbrazil/jc/pull/608. This fix will go into the next release.
Originally posted by @HervisDaubeny in https://github.com/kellyjonbrazil/jc/issues/447#issuecomment-2199942177