sipcapture / captagent

100% Open-Source Packet Capture Agent for HEP
https://sipcapture.org
GNU Affero General Public License v3.0
165 stars 75 forks source link

segfault at 0 ip 00007f480b0167de sp 00007f480a40ec00 error 4 in socket_pcap.so #272

Closed ciscospirit closed 1 year ago

ciscospirit commented 1 year ago

Hello,

it looks like that my captagent has some problems. but i can't figure it out, where the problem belongs to.

it just happens on the productive server, so where SIP data is generated and not on the standby node. i use captagent only for the SIP capturing. RTCP i am doing via RTPengine and Callogs with Hepipe.js

any idea?

May 15 17:08:10 sp2 systemd[1]: Starting Captagent - monitoring system...
May 15 17:08:10 sp2 captagent[3986489]: [NOTICE] Loaded core config
May 15 17:08:10 sp2 systemd[1]: Started Captagent - monitoring system.
May 15 17:08:14 sp2 kernel:[27918.663166] captagent[3986495]: segfault at 0 ip 00007f480b0167de sp 00007f480a40ec00 error 4 in socket_pcap.so[7f480b015000+9000]
May 15 17:08:14 sp2 kernel:[27918.663182] Code: e0 ff ff 89 50 14 8b 95 74 ec ff ff 48 8b 85 a0 e0 ff ff 89 50 10 48 8b 45 a8 48 89 45 d0 48 8b 45 a8 48 89 45 c8 48 8b 45 d0 <0f> b6 00 c0 e8 04 0f b6 c0 89 85 6c ff ff ff 48 8d 85 40 ed ff ff
May 15 17:08:14 sp2 systemd[1]: Started Process Core Dump (PID 3986800/UID 0).
May 15 17:08:14 sp2 systemd-coredump[3986801]: Process 3986490 (captagent) of user 0 dumped core.#012#012Stack trace of thread 3986495:#012#0  0x00007f480b0167de callback_proto (socket_pcap.so + 0x47de)#012#1  0x00007f480c6cdc05 n/a (libpcap.so.0.8 + 0x9c05)#012#2  0x00007f480c6ce074 n/a (libpcap.so.0.8 + 0xa074)#012#3  0x00007f480c6d4b0e pcap_loop (libpcap.so.0.8 + 0x10b0e)#012#4  0x00007f480b018d3e proto_collect (socket_pcap.so + 0x6d3e)#012#5  0x00007f480c675ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#6  0x00007f480c5a5def __clone (libc.so.6 + 0xfddef)#012#012Stack trace of thread 3986493:#012#0  0x00007f480c5a6116 epoll_wait (libc.so.6 + 0xfe116)#012#1  0x00007f480c103b3f n/a (libuv.so.1 + 0x20b3f)#012#2  0x00007f480c0f2714 uv_run (libuv.so.1 + 0xf714)#012#3  0x00007f480c71e2d3 _run_uv_loop (transport_hep.so + 0x52d3)#012#4  0x00007f480c675ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#5  0x00007f480c5a5def __clone (libc.so.6 + 0xfddef)#012#012Stack trace of thread 3986490:#012#0  0x00007f480c59d8b3 __select (libc.so.6 + 0xf58b3)#012#1  0x000055ac9b46f114 main (captagent + 0x4114)#012#2  0x00007f480c4ced0a __libc_start_main (libc.so.6 + 0x26d0a)#012#3  0x000055ac9b46e44a _start (captagent + 0x344a)#012#012Stack trace of thread 3986494:#012#0  0x00007f480c56dc61 clock_nanosleep (libc.so.6 + 0xc5c61)#012#1  0x00007f480c573443 __nanosleep (libc.so.6 + 0xcb443)#012#2  0x00007f480c57337a sleep (libc.so.6 + 0xcb37a)#012#3  0x00007f480b8a8c40 gather_data_run (database_hash.so + 0x6c40)#012#4  0x00007f480b8a8d8e timer_loop (database_hash.so + 0x6d8e)#012#5  0x00007f480c675ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#6  0x00007f480c5a5def __clone (libc.so.6 + 0xfddef)
May 15 17:08:14 sp2 systemd[1]: systemd-coredump@1114-3986800-0.service: Succeeded.
May 15 17:08:14 sp2 systemd[1]: captagent.service: Main process exited, code=killed, status=11/SEGV
May 15 17:08:14 sp2 systemd[1]: captagent.service: Failed with result 'signal'.
May 15 17:08:14 sp2 systemd[1]: captagent.service: Scheduled restart job, restart counter is at 19.
May 15 17:08:14 sp2 systemd[1]: Stopped Captagent - monitoring system.
May 15 17:08:14 sp2 systemd[1]: Starting Captagent - monitoring system...
May 15 17:08:14 sp2 captagent[3987053]: [NOTICE] Loaded core config
May 15 17:08:14 sp2 systemd[1]: Started Captagent - monitoring system.
May 15 17:08:18 sp2 kernel:[27922.666840] captagent[3987061]: segfault at 0 ip 00007f88dde967de sp 00007f88dd28ec00 error 4 in socket_pcap.so[7f88dde95000+9000]
May 15 17:08:18 sp2 kernel:[27922.666856] Code: e0 ff ff 89 50 14 8b 95 74 ec ff ff 48 8b 85 a0 e0 ff ff 89 50 10 48 8b 45 a8 48 89 45 d0 48 8b 45 a8 48 89 45 c8 48 8b 45 d0 <0f> b6 00 c0 e8 04 0f b6 c0 89 85 6c ff ff ff 48 8d 85 40 ed ff ff
May 15 17:08:18 sp2 systemd[1]: Started Process Core Dump (PID 3987544/UID 0).
May 15 17:08:18 sp2 systemd-coredump[3987545]: Process 3987055 (captagent) of user 0 dumped core.#012#012Stack trace of thread 3987061:#012#0  0x00007f88dde967de callback_proto (socket_pcap.so + 0x47de)#012#1  0x00007f88df54dc05 n/a (libpcap.so.0.8 + 0x9c05)#012#2  0x00007f88df54e074 n/a (libpcap.so.0.8 + 0xa074)#012#3  0x00007f88df554b0e pcap_loop (libpcap.so.0.8 + 0x10b0e)#012#4  0x00007f88dde98d3e proto_collect (socket_pcap.so + 0x6d3e)#012#5  0x00007f88df4f5ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#6  0x00007f88df425def __clone (libc.so.6 + 0xfddef)#012#012Stack trace of thread 3987057:#012#0  0x00007f88df426116 epoll_wait (libc.so.6 + 0xfe116)#012#1  0x00007f88def83b3f n/a (libuv.so.1 + 0x20b3f)#012#2  0x00007f88def72714 uv_run (libuv.so.1 + 0xf714)#012#3  0x00007f88df59e2d3 _run_uv_loop (transport_hep.so + 0x52d3)#012#4  0x00007f88df4f5ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#5  0x00007f88df425def __clone (libc.so.6 + 0xfddef)#012#012Stack trace of thread 3987055:#012#0  0x00007f88df41d8b3 __select (libc.so.6 + 0xf58b3)#012#1  0x000055fe6ed50114 main (captagent + 0x4114)#012#2  0x00007f88df34ed0a __libc_start_main (libc.so.6 + 0x26d0a)#012#3  0x000055fe6ed4f44a _start (captagent + 0x344a)#012#012Stack trace of thread 3987058:#012#0  0x00007f88df3edc61 clock_nanosleep (libc.so.6 + 0xc5c61)#012#1  0x00007f88df3f3443 __nanosleep (libc.so.6 + 0xcb443)#012#2  0x00007f88df3f337a sleep (libc.so.6 + 0xcb37a)#012#3  0x00007f88de728c40 gather_data_run (database_hash.so + 0x6c40)#012#4  0x00007f88de728d8e timer_loop (database_hash.so + 0x6d8e)#012#5  0x00007f88df4f5ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#6  0x00007f88df425def __clone (libc.so.6 + 0xfddef)
May 15 17:08:18 sp2 systemd[1]: systemd-coredump@1115-3987544-0.service: Succeeded.
May 15 17:08:18 sp2 systemd[1]: captagent.service: Main process exited, code=killed, status=11/SEGV
May 15 17:08:18 sp2 systemd[1]: captagent.service: Failed with result 'signal'.
May 15 17:08:18 sp2 systemd[1]: captagent.service: Scheduled restart job, restart counter is at 20.
May 15 17:08:18 sp2 systemd[1]: Stopped Captagent - monitoring system.
May 15 17:08:18 sp2 systemd[1]: Starting Captagent - monitoring system...
May 15 17:08:18 sp2 captagent[3987663]: [NOTICE] Loaded core config
May 15 17:08:18 sp2 systemd[1]: Started Captagent - monitoring system.
May 15 17:08:22 sp2 kernel:[27926.674868] captagent[3987667]: segfault at 0 ip 00007fe29302a7de sp 00007fe292422c00 error 4 in socket_pcap.so[7fe293029000+9000]
May 15 17:08:22 sp2 kernel:[27926.674883] Code: e0 ff ff 89 50 14 8b 95 74 ec ff ff 48 8b 85 a0 e0 ff ff 89 50 10 48 8b 45 a8 48 89 45 d0 48 8b 45 a8 48 89 45 c8 48 8b 45 d0 <0f> b6 00 c0 e8 04 0f b6 c0 89 85 6c ff ff ff 48 8d 85 40 ed ff ff
May 15 17:08:22 sp2 systemd[1]: Started Process Core Dump (PID 3988073/UID 0).
May 15 17:08:22 sp2 systemd-coredump[3988074]: Process 3987664 (captagent) of user 0 dumped core.#012#012Stack trace of thread 3987667:#012#0  0x00007fe29302a7de callback_proto (socket_pcap.so + 0x47de)#012#1  0x00007fe2946e1c05 n/a (libpcap.so.0.8 + 0x9c05)#012#2  0x00007fe2946e2074 n/a (libpcap.so.0.8 + 0xa074)#012#3  0x00007fe2946e8b0e pcap_loop (libpcap.so.0.8 + 0x10b0e)#012#4  0x00007fe29302cd3e proto_collect (socket_pcap.so + 0x6d3e)#012#5  0x00007fe294689ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#6  0x00007fe2945b9def __clone (libc.so.6 + 0xfddef)#012#012Stack trace of thread 3987665:#012#0  0x00007fe2945ba116 epoll_wait (libc.so.6 + 0xfe116)#012#1  0x00007fe294117b3f n/a (libuv.so.1 + 0x20b3f)#012#2  0x00007fe294106714 uv_run (libuv.so.1 + 0xf714)#012#3  0x00007fe2947322d3 _run_uv_loop (transport_hep.so + 0x52d3)#012#4  0x00007fe294689ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#5  0x00007fe2945b9def __clone (libc.so.6 + 0xfddef)#012#012Stack trace of thread 3987664:#012#0  0x00007fe2945b18b3 __select (libc.so.6 + 0xf58b3)#012#1  0x000055a3776cb114 main (captagent + 0x4114)#012#2  0x00007fe2944e2d0a __libc_start_main (libc.so.6 + 0x26d0a)#012#3  0x000055a3776ca44a _start (captagent + 0x344a)#012#012Stack trace of thread 3987666:#012#0  0x00007fe294581c61 clock_nanosleep (libc.so.6 + 0xc5c61)#012#1  0x00007fe294587443 __nanosleep (libc.so.6 + 0xcb443)#012#2  0x00007fe29458737a sleep (libc.so.6 + 0xcb37a)#012#3  0x00007fe2938bcc40 gather_data_run (database_hash.so + 0x6c40)#012#4  0x00007fe2938bcd8e timer_loop (database_hash.so + 0x6d8e)#012#5  0x00007fe294689ea7 start_thread (libpthread.so.0 + 0x8ea7)#012#6  0x00007fe2945b9def __clone (libc.so.6 + 0xfddef)
May 15 17:08:22 sp2 systemd[1]: systemd-coredump@1116-3988073-0.service: Succeeded.
May 15 17:08:22 sp2 systemd[1]: captagent.service: Main process exited, code=killed, status=11/SEGV
May 15 17:08:22 sp2 systemd[1]: captagent.service: Failed with result 'signal'.
May 15 17:08:22 sp2 systemd[1]: captagent.service: Scheduled restart job, restart counter is at 21.
May 15 17:08:22 sp2 systemd[1]: Stopped Captagent - monitoring system.
# dpkg -l |grep pcap
ii  libpcap-dev:amd64                                        1.10.0-2                                      amd64        development library for libpcap (transitional package)
ii  libpcap0.8:amd64                                         1.10.0-2                                      amd64        system interface for user-level packet capture
ii  libpcap0.8-dev:amd64                                     1.10.0-2                                      amd64        development library and header files for libpcap0.8

My Config Files. Every other config file or capture plan I left untouched from version 6.4.1

<?xml version="1.0"?>
<document type="captagent/xml">
        <configuration name="core.conf" description="CORE Settings" serial="2014024212">
            <settings>
                <param name="debug" value="1"/>
                <param name="version" value="2"/>
                <param name="serial" value="2014056501"/>
                <param name="uuid" value="00781a4a-5b69-11e4-9522-bb79a8fcf0f3"/>
                <param name="daemon" value="false"/>
                <param name="syslog" value="false"/>
                <param name="pid_file" value="/run/captagent.pid"/>
                <param name="module_path" value="/usr/local/captagent/lib/captagent/modules"/>
                <param name="config_path" value="/usr/local/captagent/etc/captagent"/>
                <param name="capture_plans_path" value="/usr/local/captagent/etc/captagent/captureplans"/>
                <param name="backup" value="/usr/local/captagent/etc/captagent/backup"/>
                <param name="chroot" value="/usr/local/captagent/etc/captagent"/>
            </settings>
        </configuration>
        <configuration name="modules.conf" description="Modules">
            <modules>

                <load module="transport_hep" register="local"/>
                <load module="protocol_sip" register="local"/>
                <load module="database_hash" register="local"/>
                <load module="protocol_rtcp" register="local"/>
                <load module="socket_pcap" register="local"/>
                <load module="protocol_tls" register="local"/>

        <!--
                <load module="protocol_rtcp" register="local"/>
                <load module="socket_rtcpxr" register="local"/>
                <load module="socket_raw" register="local"/>
                <load module="transport_json" register="local"/>
                <load module="protocol_rtcp" register="local"/>
                <load module="interface_http" register="local"/>
                <load module="database_redis" register="local"/>
                <load module="socket_pfring" register="local"/>
            -->
            </modules>
        </configuration>
</document>

socket_pcap.xml

<?xml version="1.0"?>
<document type="captagent_module/xml">
    <module name="socket_pcap" description="HEP Socket" serial="2014010402">
    <profile name="socketspcap_sip" description="HEP Socket" enable="true" serial="2014010402">
        <settings>
        <param name="dev" value="any"/>
        <param name="promisc" value="true"/>
        <param name="reasm" value="false"/>
        <param name="websocket-detection" value="false"/>
        <param name="tcpdefrag" value="false"/>
        <param name="erspan" value="false"/>
            <!-- <param name="capture-filter" value="ip_to_ip"/> -->
        <param name="capture-plan" value="sip_capture_plan.cfg"/>
        <param name="filter">
            <value>port 5060</value>
        </param>
        </settings>
    </profile>
    <profile name="socketspcap_rtcp" description="RTCP Socket" enable="false" serial="2014010402">
            <settings>
                <param name="dev" value="any"/>
                <param name="promisc" value="true"/>
                <param name="reasm" value="false"/>
                <!-- size in MB -->
                <param name="ring-buffer" value="20"/>
                <!-- for rtp && rtcp < 250 -->
                <param name="snap-len" value="256"/>
                <param name="capture-filter" value="rtcp"/>
                <param name="capture-plan" value="rtcp_capture_plan.cfg"/>
                <param name="filter">
                    <value>portrange 8000-30000 and len >=64 </value>
                </param>
            </settings>
        </profile>
    <profile name="socketspcap_tls" description="TLS Socket" enable="false" serial="2014010402">
        <settings>
        <param name="dev" value="any"/>
        <param name="promisc" value="true"/>
        <param name="reasm" value="false"/>
        <param name="tcpdefrag" value="true"/>
        <param name="capture-plan" value="tls_capture_plan.cfg"/>
        <param name="filter">
            <value>tcp port 5061</value>
        </param>
        </settings>
    </profile>
    <profile name="socketspcap_sctp" description="SCTP Socket" enable="false" serial="2014010402">
            <settings>
                <param name="dev" value="any"/>
                <param name="promisc" value="true"/>
                <param name="reasm" value="true"/>
                <param name="ipv4fragments" value="true"/>
                <param name="ipv6fragments" value="true"/>
                <param name="proto-type" value="sip"/>
                <param name="capture-plan" value="isup_capture_plan.cfg"/>
                <param name="filter">
                    <value>proto 132</value>
                </param>
            </settings>
        </profile>
    <profile name="socketspcap_diameter" description="DIAMETER Socket" enable="false" serial="2014010402">
            <settings>
                <param name="dev" value="any"/>
                <param name="promisc" value="true"/>
                <param name="reasm" value="false"/>
                <param name="tcpdefrag" value="true"/>
                <param name="capture-plan" value="diameter_capture_plan.cfg"/>
                <param name="filter">
                    <value>port 3868</value>
                </param>
            </settings>
        </profile>
    </module>
</document>

transport_hep.xml

<?xml version="1.0"?>
<document type="captagent_module/xml">
    <module name="transport_hep" description="HEP Protocol" serial="2014010402">
    <profile name="hepsocket" description="Transport HEP" enable="true" serial="2014010402">
        <settings>
        <param name="version" value="3"/>
        <param name="capture-host" value="homerxxx.xxx"/>
        <param name="capture-port" value="9060"/>
        <param name="capture-proto" value="udp"/>
        <param name="capture-id" value="2001"/>
        <param name="capture-password" value="myhep"/>
        <param name="payload-compression" value="false"/>
        </settings>
    </profile>
    </module>
</document>

protocol_sip.xml

<?xml version="1.0"?>
<document type="captagent_module/xml">
    <module name="protocol_sip" description="SIP Protocol" serial="2014010402">
        <profile name="proto_sip" description="PROTO SIP" enable="true" serial="2014010402">
            <settings>
                <param name="dialog-type" value="2"/>
                <param name="dialog-timeout" value="180"/>
            </settings>
        </profile>
    </module>
</document>
lmangani commented 1 year ago

Hey @ciscospirit do you have the coredump file available?

ciscospirit commented 1 year ago

Can you tell me, how i can create that? i enabled it here /etc/default/captagent but it doesn't generate coredump for it.

btw: beside this problem, if i need it just for SIP, would it not easier to use heplify instead of captagent?

lmangani commented 1 year ago

@ciscospirit sure for simple SIP-only jobs heplify is always a great option with no requirements

ciscospirit commented 1 year ago

so should be cancel the debugging for this right now and switching to heplify? can you maybe give me instruction, how to use it just for SIP?

is this enough? ./heplify -hs homer.xxx.xxx:9060 -nt tls -m SIP -hn sp1

or do i need something more?

lmangani commented 1 year ago

I'm not sure you should use the tls profile, unless the homer side is configured to receive it. Start simple:

./heplify -hs homer.xxx.xxx:9060 -m SIP -hn sp1 -hi 1001
kYroL01 commented 1 year ago

@ciscospirit I tested right now the configuration you hav, with ONLY SIP enabled in in socket_pcap.xml and I don't see any problem. Can you install coredumpctl and check for the core dump created. Second, It's always a good practice to put the networking device name in dev <param name="dev" value="eth0"/> and not let any

Also, remove this <load module="protocol_tls" register="local"/> from rtpagent.xml

kYroL01 commented 1 year ago

Anyway for heplify, you can check here for Usage and Examples.

ciscospirit commented 1 year ago

hey, as we have several neth adapters inside for subsribers and peerings, i thought "any" would be better for that?

what about the rest of the modules? are the all needed for sip only?

                <load module="transport_hep" register="local"/>
                <load module="protocol_sip" register="local"/>
                <load module="database_hash" register="local"/>
                <load module="protocol_rtcp" register="local"/>
                <load module="socket_pcap" register="local"/>

i think database_hash and protocol_rtcp is useless or?

lmangani commented 1 year ago

any is only a good option if you don't have actual traffic on localhost as well as real interfaces. If you only want SIP, none of the modules matter. You can switch without special settings.

ciscospirit commented 1 year ago
root@sp2:/# coredumpctl list
No journal files were found.
No coredumps found.
kYroL01 commented 1 year ago

@ciscospirit These are the modules that MUST be present in rtpagent.xml to have a minimum working config

<load module="transport_hep" register="local"/>
<load module="protocol_sip" register="local"/>
<load module="database_hash" register="local"/>
<load module="protocol_rtcp" register="local"/>
<load module="socket_pcap" register="local"/>

but if you only want SIP you just need to have this setup in socket_pcap.xml

  1. <profile name="socketspcap_sip" description="HEP Socket" enable="true" serial="2014010402">
  2. <profile name="socketspcap_rtcp" description="HEP Socket" enable="false" serial="2014010402">

Then any could not be a good fit for some VLAN traffic, it depends on various factors

Btw the coredump could be generated on the next stop of captagent.

My 5c here is that some SIP traffic is causing this because it has some VLAN tags and has an issue with any.

I just tested with normal SIP on 5060 and no issue occurred.

ciscospirit commented 1 year ago

yes we have a lot of vlans and bond configured... so in your opinion, which interface i should take? can i list there several interfaces like neth0,neth1,neth2?

lmangani commented 1 year ago

@ciscospirit i suggest moving this issue to helplify if that's the subject, and reading the docs and examples there

ciscospirit commented 1 year ago

we are still trying to solve it with captagent and the bond,vlan,neth configuration, as kYro things, it is a problem with the vlan.

we have neth8,neth9 which generates bonding0 and neth0, neth2 for bond2, so i am not sure how i configure socket_pcap.xml for several devices.

can i just duplicate the " " as often i have it made for every device or can i use several devices in one param setting?

kYroL01 commented 1 year ago

@ciscospirit no, you cannot list neth0,neth1,...,netN in dev. You need to define multiple profiles for SIP, one each taking its interface, but this has some limitations, as socket_pcap is using libpcap library and the application is not multithreading. So it could be that the application cannot hold too much traffic, but it's good to try. Plus, if you have VLAN you must set it in the BPF filter.

This said, you can try something like this is socket_pcap.xml

<profile name="socketspcap_sip0" description="HEP Socket" enable="true" serial="2014010402">
        <settings>
        <param name="dev" value="neth0"/>
        <param name="promisc" value="true"/>
        <param name="reasm" value="false"/>
        <param name="websocket-detection" value="false"/>
        <param name="tcpdefrag" value="false"/>
        <param name="erspan" value="false"/>
            <!-- <param name="capture-filter" value="ip_to_ip"/> -->
        <param name="capture-plan" value="sip_capture_plan.cfg"/>
        <param name="filter">
            <value>vlan and port 5060 and length >= 64</value>
        </param>
        </settings>
    </profile>
    <profile name="socketspcap_sip1" description="HEP Socket" enable="true" serial="2014010402">
        <settings>
        <param name="dev" value="neth1"/>
        <param name="promisc" value="true"/>
        <param name="reasm" value="false"/>
        <param name="websocket-detection" value="false"/>
        <param name="tcpdefrag" value="false"/>
        <param name="erspan" value="false"/>
            <!-- <param name="capture-filter" value="ip_to_ip"/> -->
        <param name="capture-plan" value="sip_capture_plan.cfg"/>
        <param name="filter">
            <value>vlan and port 5060 and length >= 64</value>
        </param>
        </settings>
    </profile>

NOTE: socketspcap_sip0 and socketspcap_sip1

NOTE1: it should be recommended to put the minimum length to filter out bad or corrupted files (the minimum size of an ISO-OSI packet with a payload is usually 64 bytes).

NOTE2: That VLAN BPF filter is intended to catch SIP traffic ONLY with VLAN. If you want to catch both VLAN and not VLAN this should be the correct filter (ip and port 5060 and len >= 64) or (vlan and port 5060 and len >= 64)

Let me know if this helps, anyway do some tst from your side, I'm sure there's a configuration setup that do the trick for you :)

Regards