jasonish / docker-suricata

A Suricata Docker image.
https://hub.docker.com/r/jasonish/suricata/
MIT License
249 stars 76 forks source link

segmentation fault on latest release on high speed traffic ... #41

Open ulysse31 opened 1 month ago

ulysse31 commented 1 month ago

Hello,

I'm using SELKS project docker install, which is based on docker image jasonish/suricata:master-amd64. I have two nodes running the same install (SELKS). I updated today both instances (all docker ocntainers, including suricata), and for some unknown reason, one of the two had suricata container crashing in loop (after around a min run). I firstly though on a SELKS issue, potentially related on rule generation ... but even after wiping all containers / image / volumes / data ... the suricata container still crash loop with a segmentation fault...

[Fri Jul 26 10:44:31 2024] W#06-bond1[78735]: segfault at 0 ip 00000000009349a9 sp 00007f853fffc270 error 4 in suricata[4d4000+637000] likely on CPU 22 (core 14, socket 0) [Fri Jul 26 10:44:31 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa

you'll find an output of the docker log suricata -f

suricata_docker_output.txt

Last lines being :

Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Perf: af-packet: bond1: rx ring: block_size=32768 block_nr=2 frame_size=1600 frame_nr=40 [AFPComputeRingParams:source-af-packet.c:1598] Notice: threads: Threads created -> W: 64 FM: 1 FR: 1 Engine started. [TmThreadWaitOnThreadRunning:tm-threads.c:1905]

And after that, comes the dmesg segmentation error, and the container crash then boot loop ...

The only difference between the two servers, is that one is using a bonding interface to listen to (bond1), and the other one, listens directly to a physical one ... So from what I see, it can be either something related to the recent update on the suricata image (11hours ago), or potentially a hw issue ? but that seems unlikely because there is no error message on host and on switch ...

Is there a possibility that the latest version would have issues on 10Gbit interface bondigs ? Do you have any additional debug that would give more hints ? Thanks a lot.

ulysse31 commented 1 month ago

UPDATE:

I was thinking that it may be related to bonding ... But it seems that it does also segmentation fault on the other "interface direct" server :

[Fri Jul 26 06:24:41 2024] W#09-eno2np1[3532764]: segfault at 0 ip 00000000009349a9 sp 00007f1f5fffc270 error 4 in suricata[4d4000+637000] likely on CPU 4 (core 1, socket 0) [Fri Jul 26 06:24:41 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa

This one is New york time zone (the other one is Paris timezone) So it segmentation fault on both ... but the big difference is potentially on the bandwidth: one is a single 10Gbps interface, the other one is a bonding of 2 10Gbps interface, because of the traffic volume. So, to reformulate, the latest version of docker suricata, seems to segmentation fault on High traffic (average 20MBytes/s on bond1) The other one in New York is right now arround 2/3Mbytes/s (low activity / early morning)

ulysse31 commented 1 month ago

UPDATE2:

Confirmed after traffic waking up in New York ...

[Fri Jul 26 08:10:34 2024] W#31-eno2np1[3671915]: segfault at 0 ip 00000000009349a9 sp 00007f31ad4f1270 error 4 in suricata[4d4000+637000] likely on CPU 6 (core 6, socket 0) [Fri Jul 26 08:10:34 2024] Code: 74 24 50 48 85 f6 74 0b ba 01 00 00 00 ff 15 76 8c 44 00 48 89 df e8 06 06 ba ff 0f 0b 0f 1f 40 00 48 83 ec 18 48 85 d2 74 38 <0f> b6 06 89 c1 83 e1 1f 41 b8 01 00 00 00 83 f9 1f 75 5b 48 83 fa

Seems that docker suricata no longer support high traffic and crashes on high traffic ...

ulysse31 commented 1 month ago

UPDATE3:

Updated the title, since I can now confirm that the segmentation fault / crash appear starting from a certain traffic activity on both of my test systems ... I've tried master-amd64, master-profiling, master ... they all do the same segmentation fault crash loop on high traffic ...