sipcapture / captagent

100% Open-Source Packet Capture Agent for HEP
https://sipcapture.org
GNU Affero General Public License v3.0
167 stars 75 forks source link

Segfault with captagent version 6.4.1 on Debian 11 #274

Open sivagurudialpad opened 1 year ago

sivagurudialpad commented 1 year ago

Hi,

I recently upgraded my OS from Deb10 to Deb11. I started noticing coredumps from captagent after the upgrade. I am using captagent version 6.4.1. I am running captagent within a Kubernetes pod (mentioning it here in case it makes any difference) I have included the details below. Please let me know if you require further information.

Version info

# /usr/local/captagent/sbin/captagent -v
version: 6.4.1

Os info

root@prober-phase3-kube-api-production-eqx-sjc-6684c49d9f-7gw5l:/usr/local/prober# cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

Backtrace full

(gdb) bt full
#0  0x00007fbfa57c86d8 in callback_proto (arg=0x7fbfa4bc2e64 "", pkthdr=0x7fbfa4bc2ba0, packet=0x7fbfa5304524 "") at socket_pcap.c:555
        _msg = {data = 0x7fbfa5304340, profile_name = 0x559ff07abab0 "hepsocket", len = 396, hdr_len = 44, tcpflag = 0 '\000', sctp_ppid = 0, 
          rcinfo = {ip_family = 2 '\002', ip_proto = 17 '\021', proto_type = 1 '\001', src_mac = 0x7fbfa4bc1830 "02-55-17-E5-D1-72", 
            dst_mac = 0x7fbfa4bc1810 "", src_ip = 0x7fbfa4bc1880 "170.10.200.68", dst_ip = 0x7fbfa4bc1850 "10.32.149.51", src_port = 5060, 
            dst_port = 9638, time_sec = 1698460986, time_usec = 372569, liid = 0, cval1 = 0, cval2 = 0, sessionid = 0, direction = 0 '\000', 
            uuid = 0x0, correlation_id = {s = 0x0, len = 0}, tags = {s = '\000' <repeats 127 times>, len = 0}, socket = 0x0}, 
          parse_it = 1 '\001', parsed_data = 0x0, sip = {responseCode = 0, isRequest = true, validMessage = true, methodType = ACK, 
            methodString = {s = 0x7fbfa5304340 "", len = 3}, method_len = 0, callId = {s = 0x7fbfa5304450 "", len = 36}, reason = {s = 0x0, 
              len = 0}, hasSdp = false, cdm = {{name = '\000' <repeats 119 times>, id = 0, rate = 0, next = 0x0} <repeats 20 times>}, mrp = {{
                media_ip = {s = 0x0, len = 0}, media_port = 0, rtcp_ip = {s = 0x0, len = 0}, rtcp_port = 0, prio_codec = 0} <repeats 20 times>}, 
            cdm_count = 0, mrp_size = 0, contentLength = 0, len = 0, cSeqNumber = 74676698, hasVqRtcpXR = false, rtcpxr_callid = {s = 0x0, 
              len = 0}, cSeqMethodString = {s = 0x7fbfa5304485 "", len = 3}, cSeqMethod = ACK, cSeq = {s = 0x7fbfa530447c "", len = 12}, via = {
              s = 0x0, len = 0}, contactURI = {s = 0x0, len = 0}, ruriUser = {s = 0x7fbfa5304348 "", len = 0}, ruriDomain = {
              s = 0x7fbfa5304348 "", len = 13}, fromUser = {s = 0x7fbfa53043d8 "", len = 12}, fromDomain = {s = 0x7fbfa53043e5 "", len = 13}, 
            toUser = {s = 0x7fbfa5304410 "", len = 12}, toDomain = {s = 0x7fbfa530441d "", len = 13}, userAgent = {s = 0x0, len = 0}, paiUser = {
              s = 0x0, len = 0}, paiDomain = {s = 0x0, len = 0}, requestURI = {s = 0x7fbfa5304344 "", len = 36}, customHeader = {s = 0x0, 
              len = 0}, hasCustomHeader = false, pidURI = {s = 0x0, len = 0}, hasPid = false, fromURI = {s = 0x7fbfa53043cc "", len = 57}, 
            hasFrom = true, toURI = {s = 0x7fbfa530440b "", len = 58}, hasTo = true, ruriURI = {s = 0x0, len = 0}, hasRuri = false, toTag = {
              s = 0x7fbfa5304435 "", len = 13}, hasToTag = true, fromTag = {s = 0x7fbfa53043f8 "", len = 10}, hasFromTag = true}, 
          cap_packet = 0x7fbfa5304314, cap_header = 0x7fbfa4bc2ba0, var = 0x0, corrdata = 0x0, mfree = 0 '\000', flag = {0, 0, 0, 0, 0, 0, 0, 0, 
            0, 0}}
        eth = 0x0
        sll = 0x7fbfa5304524
        ip4_pkt = 0x0
        ip6_pkt = 0x0
        ctx = {route_rec_lev = 0, rec_lev = 0, run_flags = 0, last_retcode = 0}
        ip_src = "170.10.200.68", '\000' <repeats 33 times>
        ip_dst = "10.32.149.51", '\000' <repeats 34 times>
        mac_src = "02-55-17-E5-D1-72\000\000"
        mac_dst = '\000' <repeats 19 times>
        ip_ver = 4
        ipip_offset = 0
        action_idx = 0
        type_ip = 0
        hdr_preset = 0 '\000'
        hdr_offset = 4 '\004'
        vlan = 2 '\002'
        ip_proto = 0 '\000'
        erspan_offset = 0 '\000'
        tmp_ip_proto = 0 '\000'
--Type <RET> for more, q to quit, c to continue without paging--
        tmp_ip_len = 0 '\000'
        is_only_gre = 0 '\000'
        ethaddr = 0x81 <error: Cannot access memory at address 0x81>
        mplsaddr = 0x45 <error: Cannot access memory at address 0x45>
        loc_index = 0 '\000'
        len = 1183
        ip_hl = 0
        ip_off = 0
        frag_offset = 0
        fragmented = 0 '\000'
        psh = 0 '\000'
        data = 0x7fbfa5304340 ""
        datatcp = 0x1600000028 <error: Cannot access memory at address 0x1600000028>
        pack = 0x0
#1  0x00007fbfa6e58c05 in ?? () from /usr/lib/x86_64-linux-gnu/libpcap.so.0.8
No symbol table info available.
#2  0x00007fbfa6e59074 in ?? () from /usr/lib/x86_64-linux-gnu/libpcap.so.0.8
No symbol table info available.
#3  0x00007fbfa6e5fb0e in pcap_loop () from /usr/lib/x86_64-linux-gnu/libpcap.so.0.8
No symbol table info available.
#4  0x00007fbfa57cab75 in proto_collect (arg=0x559ff07a4130) at socket_pcap.c:1267
        loc_idx = 0
        ret = 0
        is_file = 0
#5  0x00007fbfa6dffea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140461079279360, -749651851201127914, 140725320596318, 140725320596319, 140461079277376, 
                8396800, 785872520026872342, 785876086468336150}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, 
              cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#6  0x00007fbfa6d1fa2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
(gdb)

Thread information

(gdb) info thr
  Id   Target Id                      Frame 
* 1    Thread 0x7fbfa4bc3700 (LWP 47) 0x00007fbfa57c86d8 in callback_proto (arg=0x7fbfa4bc2e64 "", pkthdr=0x7fbfa4bc2ba0, 
    packet=0x7fbfa5304524 "") at socket_pcap.c:555
  2    Thread 0x7fbfa6860700 (LWP 44) 0x00007fbfa6d1fd56 in epoll_wait (epfd=3, events=0x7fbfa685cda0, maxevents=1024, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7fbfa2fc2700 (LWP 48) 0x00007fbfa6d1396f in __GI___poll (fds=0x7fbfa2fc1d30, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
  4    Thread 0x7fbfa689f000 (LWP 42) 0x00007fbfa6d15e23 in __GI___select (nfds=0, readfds=0x0, writefds=0x0, exceptfds=0x0, timeout=0x0)
    at ../sysdeps/unix/sysv/linux/select.c:41
  5    Thread 0x7fbfa5fd6700 (LWP 45) 0x00007fbfa6ce61a1 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, 
    req=req@entry=0x7fbfa5fd5de0, rem=rem@entry=0x7fbfa5fd5de0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48

Configuration

<?xml version="1.0"?>
<document type="captagent/xml">
        <configuration name="core.conf" description="CORE Settings" serial="2014024212">
            <settings>
                <param name="debug" value="3"/>
                <param name="version" value="2"/>
                <param name="serial" value="2014056501"/>
                <param name="uuid" value="00781a4a-5b69-11e4-9522-bb79a8fcf0f3"/>
                <param name="daemon" value="false"/>
                <param name="syslog" value="false"/>
                <param name="pid_file" value="/var/run/captagent.pid"/>
                <!-- Configure using installation path if different from default -->
                <param name="module_path" value="/usr/local/captagent/lib/captagent/modules"/>
                <param name="config_path" value="/usr/local/captagent/etc/captagent/"/>
                <param name="capture_plans_path" value="/usr/local/captagent/etc/captagent/captureplans"/>
                <param name="backup" value="/usr/local/captagent/etc/captagent/backup"/>
                <param name="chroot" value="/usr/local/captagent/etc/captagent"/>
            </settings>
        </configuration>
        <configuration name="modules.conf" description="Modules">
            <modules>

                <load module="transport_hep" register="local"/>
                <load module="protocol_sip" register="local"/>
                <load module="database_hash" register="local"/>
                <load module="protocol_rtcp" register="local"/>
                <load module="socket_pcap" register="local"/>

                <!-- NOTE: Block required for RTCPXR socket + RTCPXR protocol -->
                <!-- 
                        <load module="protocol_rtcpxr" register="local"/>
                        <load module="socket_collector" register="local"/> 
                -->

                <!--
                <load module="socket_tzsp" register="local"/>
                <load module="protocol_ss7" register="local"/>
                <load module="protocol_diameter" register="local"/>
                <load module="protocol_tls" register="local"/>
                <load module="output_json" register="local"/>
                <load module="interface_http" register="local"/>
                <load module="database_redis" register="local"/>
                -->
        </modules>
        </configuration>
</document>

Corefile captagent.corefile.sig11.42.zip

sivagurudialpad commented 1 year ago

I found the following issues that seem to be related

My socket_pcap.xml file mentions <param name="dev" value="any"/> in all the modules. However..I use the same configuration on Debian 10, and it didn’t coredump on Deb10.

lmangani commented 1 year ago

@sivagurudialpad thanks for the report, our devs will take a look but I would consider using heplify instead since its lighter and more portable.

sivagurudialpad commented 1 year ago

@lmangani Thank you for your quick response. I will certainly take a look at heplify and see if it can be used instead of captagent.

sivagurudialpad commented 11 months ago

@lmangani I wanted to check with you if there are any updates regarding this ticket.

kYroL01 commented 11 months ago

Hi @sivagurudialpad not yet . I will check asap. Thank you

sivagurudialpad commented 11 months ago

Hi @kYroL01. Thank you for taking a look at this issue. I wanted to check with you if there are any updates regarding this ticket.

anupamdialpad commented 10 months ago

I see "Testing needed" label has been added. @kYroL01 We can try deploying the build if it is available

kYroL01 commented 10 months ago

Hi @anupamdialpad not yet, but I'll manage it

sivagurudialpad commented 9 months ago

Hi @kYroL01. Thank you for taking a look at this issue. I wanted to check with you if there are any updates regarding this ticket.

kYroL01 commented 9 months ago

Hi @sivagurudialpad we're looking into it. I was able to reproduce and I will work on that. Thank you

sivagurudialpad commented 9 months ago

Hi @kYroL01. Thank you very much for the update. It is good to know that it was reproducible.

sivagurudialpad commented 8 months ago

Hi @kYroL01. I wanted to check with you if there are any updates regarding this ticket.

sivagurudialpad commented 7 months ago

Hi @kYroL01. I wanted to check with you if there are any updates regarding this ticket.

sivagurudialpad commented 6 months ago

Hi @kYroL01. I wanted to check with you if there are any updates regarding this ticket. We have been following up about this ticket since Oct 2023. We have not been able to upgrade to the latest due to this issue. Could we please get a fix for it ?

kYroL01 commented 5 months ago

Hi @sivagurudialpad I was very busy with other higher priority tasks, but I will take a look and get a fix by in the next weeks.

sivagurudialpad commented 5 months ago

Hi @kYroL01. I wanted to check with you if you got a chance to look into this ticket.

sivagurudialpad commented 4 weeks ago

Hi @kYroL01, @lmangani . I wanted to check with you if you got a chance to look into this ticket. This ticket was opened Oct 2023 ... one year has passed. We would really appreciate if we could get a fix for this issue.

lmangani commented 4 weeks ago

Hi @kYroL01, @lmangani . I wanted to check with you if you got a chance to look into this ticket. This ticket was opened Oct 2023 ... one year has passed. We would really appreciate if we could get a fix for this issue.

Thanks for the nudge @sivagurudialpad. This is not for lack of interest but our team can only can realistically work on issues affecting multiple users. Until that happens, as suggested just as long ago I would invite you to consider using heplify instead since its lighter, maintained and more portable.