Closed mjeffe closed 9 years ago
I tried changing the level and overwriting a different rule to see if that triggered the bug as well, but it did not.
I have been experiencing exactly the same problem and ended up running ossec-analysisd
from within gdb
. Here's one hint on the possible reason of the crash - and why it was so difficult to replicate:
2014/12/17 11:37:37 ossec-analysisd: DEBUG: FTSInit completed.
2014/12/17 11:37:37 ossec-analysisd: DEBUG: Active response Init completed.
2014/12/17 11:37:37 ossec-analysisd: DEBUG: Startup completed. Waiting for new messages..
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff784d171 in __strlen_sse2 () from /lib64/libc.so.6
(gdb) bt full
#0 0x00007ffff784d171 in __strlen_sse2 () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007ffff7b6788c in GeoIP_open () from /usr/lib64/libGeoIP.so.1
No symbol table info available.
#2 0x00007ffff7fd81eb in ?? ()
No symbol table info available.
#3 0x00007ffff7fd85fe in ?? ()
No symbol table info available.
#4 0x00007ffff7fbff41 in ?? ()
No symbol table info available.
#5 0x00007ffff7fc0955 in main ()
No symbol table info available.
So my suspicion was that there's a null pointer passed to GeoIP_open()
which then calls strlen()
on it and crashes. Logically, I checked the GeoIP configuration in OSSEC and while I had <use_geoip>yes</use_geoip>
option in alerts
section, there was no <geoip_db_path>/usr/share/GeoIP/GeoIP.dat</geoip_db_path>
in global
. After adding it the crashes seem to have stopped.
The only fix I would suggest in OSSEC itself is reporting an error when use_geoip
is enabled, but no GeoIP database location is specified.
@kravietz You may have discovered another issue. I don't have any geoip stuff set (or compiled in AFAIK), and can reproduce the crash.
@ddpbsd Quite possible - I would suggest running the crashing daemon from within gdb, it will provide quite useful information to find the bug. Here's how I did it:
# cd /var/ossec/bin/
# gdb ossec-analysisd
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6)
(gdb) set args -fd
(gdb) run
And then just wait for the SIGSEGV to happen. When it does, run bt full
and the results should provide a hint on the bug location. The arguments -fd are for the analysisd to run in foreground.
Thanks, I'll continue to do exatly that.
On Wed, Dec 17, 2014 at 8:02 AM, Paweł Krawczyk notifications@github.com wrote:
@ddpbsd https://github.com/ddpbsd Quite possible - I would suggest running the crashing daemon from within gdb, it will provide quite useful information to find the bug. Here's how I did it:
cd /var/ossec/bin/
gdb ossec-analysisd
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-75.el6) (gdb) set args -fd (gdb) run
And then just wait for the SIGSEGV to happen. When it does, run bt full and the results should provide a hint on the bug location. The arguments -fd are for the analysisd to run in foreground.
— Reply to this email directly or view it on GitHub https://github.com/ossec/ossec-hids/issues/463#issuecomment-67318890.
Everything I've done so far definitely points at https://github.com/ossec/ossec-hids/blob/master/src/analysisd/analysisd.c#L1664 but I don't know enough to be able to figure out what's going wrong.
Like ddpbsd, my experience seems to be pointing to the if(!currently_rule->event_search()...
line, not GeoIP stuff. I've run analysisd under gdb a couple of different ways. Below is the process I used to attach to the running analysisd processes, and the output. I also tried to start analysisd with gdb as you described, but I was not confident I got the other ossec daemons started correctly.
[root@ossectst ossec]# ps -ef | grep ossec
root 5018 4903 0 12:39 pts/2 00:00:00 tail -f logs/ossec.log
root 5675 2407 0 12:50 pts/1 00:00:00 grep ossec
[root@ossectst ossec]# service ossec start
Starting OSSEC: [ OK ]
[root@ossectst ossec]# ps -ef | grep ossec
root 5018 4903 0 12:39 pts/2 00:00:00 tail -f logs/ossec.log
ossecm 5711 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-maild
root 5715 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-execd
ossec 5719 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-analysisd
root 5723 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-logcollector
ossecr 5728 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-remoted
root 5734 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-syscheckd
ossec 5737 1 0 12:50 ? 00:00:00 /var/ossec/bin/ossec-monitord
root 5747 2407 0 12:50 pts/1 00:00:00 grep ossec
[root@ossectst ossec]# gdb /var/ossec/bin/ossec-analysisd 5719
GNU gdb (GDB) Amazon Linux (7.6.1-51.24.amzn1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-amazon-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/ossec/bin/ossec-analysisd...done.
Attaching to program: /var/ossec/bin/ossec-analysisd, process 5719
Reading symbols from /lib64/libm.so.6...Reading symbols from /usr/lib/debug/lib64/libm-2.17.so.debug...done.
done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...Reading symbols from /usr/lib/debug/lib64/libc-2.17.so.debug...done.
done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug/lib64/ld-2.17.so.debug...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libnss_files.so.2...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.17.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files.so.2
0x00007f9f2b003b53 in __recvfrom_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0 0x00007f9f2b003b53 in __recvfrom_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1 0x0000000000430569 in OS_RecvUnix (socket=4, sizet=6144, ret=0x7fffe0f85870 "1:/var/log/maillog") at os_net.c:539
#2 0x0000000000403612 in OS_ReadMSG (m_queue=4) at analysisd.c:754
#3 0x0000000000403262 in main (argc=1, argv=0x7fffe0f87268) at analysisd.c:555
(gdb) cont
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x0000000000404a0e in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x1addcd0) at analysisd.c:1631
#2 0x0000000000404a53 in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x1adcac0) at analysisd.c:1654
#3 0x0000000000404a53 in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x1ad53f0) at analysisd.c:1654
#4 0x0000000000404a53 in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x199c4e0) at analysisd.c:1654
#5 0x0000000000403a5e in OS_ReadMSG (m_queue=4) at analysisd.c:984
#6 0x0000000000403262 in main (argc=1, argv=0x7fffe0f87268) at analysisd.c:555
(gdb) list
76 #else
77
78 /* This is a "normal" system call stub: if there is an error,
79 it returns -1 and sets errno. */
80
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
82 ret
83 T_PSEUDO_END (SYSCALL_SYMBOL)
84
85 #endif
(gdb) bt full
#0 0x0000000000000000 in ?? ()
No symbol table info available.
#1 0x0000000000404a0e in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x1addcd0) at analysisd.c:1631
currently_rule = 0x1add7c0
#2 0x0000000000404a53 in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x1adcac0) at analysisd.c:1654
child_node = 0x1addcd0
child_rule = 0x0
currently_rule = 0x1adc770
#3 0x0000000000404a53 in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x1ad53f0) at analysisd.c:1654
child_node = 0x1adcac0
child_rule = 0x0
currently_rule = 0x1ad5ff0
#4 0x0000000000404a53 in OS_CheckIfRuleMatch (lf=0x1b0a490, curr_node=0x199c4e0) at analysisd.c:1654
child_node = 0x1ad53f0
child_rule = 0x0
currently_rule = 0x199c210
#5 0x0000000000403a5e in OS_ReadMSG (m_queue=4) at analysisd.c:984
rulenode_pt = 0x199c4e0
i = 765
msg = "1:(itchy) 10.0.1.0->netstat -tan |grep LISTEN |grep -v 127.0.0.1 | sort\000ossec: output: 'netstat -tan |grep LISTEN |grep -v 127.0.0.1 | sort':\ntcp 0 0 0.0.0.0:22", ' ' <repeats 18 times>, "0.0.0.0:*", ' ' <repeats 19 times>...
lf = 0x1b0a490
---Type <return> to continue, or q <return> to quit---
stats_rule = 0x1b08b00
#6 0x0000000000403262 in main (argc=1, argv=0x7fffe0f87268) at analysisd.c:555
c = -2
m_queue = 4
test_config = 0
run_foreground = 0
debug_level = 0
dir = 0x444bc0 "/var/ossec"
user = 0x444bcb "ossec"
group = 0x444bcb "ossec"
uid = 503
gid = 502
cfg = 0x444bd1 "/var/ossec/etc/ossec.conf"
(gdb) cont
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb) quit
[root@ossectst ossec]# tail /var/log/messages
... snip ...
Dec 15 10:45:51 ossectst kernel: [ 2082.001287] ossec-analysisd[3772]: segfault at 0 ip (null) sp 00007fffa4fa8218 error 14 in ossec-analysisd[400000+65000]
This problem was also present in 2.7.1. Not sure how far back I want to go checking on this though.
Commenting out the check_diff option kept it from crashing for me. (not a solution, just troubleshooting)
I have a potential fix in this branch: https://github.com/ddpbsd/ossec-hids/tree/diff_overwrite I'm sure it's missing something, but my keyboard can't take many more hits from my forehead. It'd be great if someone could test it out a bit.
@mjeffe did you by chance get time to test out @ddpbsd potential fix?
I did not look into it over the christmas break. I've got a current workaround so my production system is functional. I still want to help with this issue however, so I will try to test the fix this week.
For some reason src/header/zlib.h and zconf.h were missing from the diff_overwrite branch. I grabed them from my current 2.8.1 src/header dir and was then able to complie and install.
In file included from os_crypto/shared/keys.c:21:0:
./os_zlib/os_zlib.h:14:18: fatal error: zlib.h: No such file or directory
#include "zlib.h"
^
Now I'll try to crash it...
Well, it's been running all day with no segfault. I'll let it run over the weekend and then report back.
It ran all weekend with no segfaults. Anything else you guys need me to try?
@mjeffe Thanks for testing.I'll open a pull request.
@mjeffe Accepted into master. Closing ticket.
I recently upgraded OSSEC from version 2.7.1 to 2.8.1. The new version of analysisd kept segfaulting. After a couple of weeks of working with it, I've narrowed it down to a simple rule override in
local_rules.xml
where I downgrade the level. Below I describe my test scenario where I can consistently reproduce the segfault with a bare minimum.Current environment is Amazon AWS VPC. All servers are running Amazon Linux 64-bit AMI's (based on RHEL/CentOS).
Here is what I did:
1) I spun up a brand new server (clone of my current ossec server), installed a fresh 2.8.1 ossec server. Modified local_rules.xml to look like this:
I did NOT add any agents, but let that run for about an hour (may not be necessary).
2) I then stopped the ossec server, ran manage_agents, added one agent and restarted the ossec server. NOTE, I did not transfer the agent key, so the agent was not trying to communicate yet. I let that run for about an hour (again, may not be necessary).
3) I transferred the key to the agent - this is an existing agent, I just added the new key and change the
<server-ip>
to point to my new test server and restarted the agent and server. I begin to see communication between server and agent. I let it run it's initial scans - about 10 minutes.4) Then I opened a new listen port to try and trigger the
netstat -tan
rule usingnc -l 7777
on the agent.5) ossec-analysisd segfaulted after about 7 minues.
Let me know if you want any of the output or if you want me to run any other tests.
Note, I initially posted this to https://groups.google.com/forum/#!topic/ossec-list/WM3v7fmaS6I with the same subject title. dan (ddpbsd) was able to reproduce the segfault using the above information, but there was no resolution, so it was suggested I post here.