sipcapture / captagent

100% Open-Source Packet Capture Agent for HEP
https://sipcapture.org
GNU Affero General Public License v3.0
167 stars 75 forks source link

Segmentation fault in socket_collector #156

Closed miken32 closed 7 years ago

miken32 commented 7 years ago

Trying to set up this software for the first time; I think I've got everything set up properly, but I get a segfault as soon as a RTCP-XR report comes in.

Oct 24 15:18:34 marceline kernel: captagent[14000]: segfault at 7f999bf942e8 ip 00007f999bd8d9aa sp 00007f999bd85a50 error 4 in socket_collector.so[7f999bd8b000+4000]

Running on Scientific Linux 6.9.

/usr/local/captagent/etc/captagent/captagent.xml

<?xml version="1.0"?>
<document type="captagent/xml">
    <configuration name="core.conf" description="CORE Settings" serial="2014024212">
        <settings>
        <param name="debug" value="128"/>
        <param name="version" value="2"/>
        <param name="serial" value="2014056501"/>
        <param name="uuid" value="00781a4a-5b69-11e4-9522-bb79a8fcf0f3"/>
        <param name="daemon" value="false"/>
        <param name="syslog" value="true"/>
        <param name="pid_file" value="/var/run/captagent.pid"/>
        <!-- Configure using installation path if different from default -->
        <param name="module_path" value="/usr/local/captagent/lib/captagent/modules"/>
        <param name="config_path" value="/usr/local/captagent/etc/captagent/"/>
        <param name="capture_plans_path" value="/usr/local/captagent/etc/captagent/captureplans"/>
        <param name="backup" value="/usr/local/captagent/etc/captagent/backup"/>
        <param name="chroot" value="/usr/local/captagent/etc/captagent"/>
        </settings>
    </configuration>
    <configuration name="modules.conf" description="Modules">
        <modules>
        <load module="transport_hep" register="local"/>
        <load module="protocol_sip" register="local"/>
        <load module="database_hash" register="local"/>
        <load module="protocol_rtcp" register="local"/> 
        <load module="socket_pcap" register="local"/>
        <load module="protocol_rtcpxr" register="local"/>
        <load module="socket_collector" register="local"/>
        </modules>
    </configuration>
</document>

/usr/local/captagent/etc/captagent/socket_collector.xml

<?xml version="1.0"?>
<document type="captagent_module/xml">
    <module name="socket_collector" description="Collector Socket" serial="2014010402">
    <profile name="sockets_collector_rtcpxr" description="RTCPXR local" enable="true" serial="2014010402">
        <settings>
        <param name="host" value="0.0.0.0"/>
        <param name="port" value="5060"/>
        <param name="proto" value="udp"/>
        <param name="method-publish" value="true"/>
        <param name="short-report" value="true"/>
        <param name="reply" value="true"/>
        <param name="capture-plan" value="rtcpxr_capture_plan.cfg"/>
        </settings>
    </profile>
    </module>
</document>

Typical RTCP-XR call report:

PUBLISH sip:12.34.62.205:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.244.100;branch=z9hG4bK37f6c6b24591A2DB
From: "7040" <sip:7040@mypbx.ca>;tag=620D0AF0-A45569B9
To: <sip:12.34.62.205:5060>
CSeq: 1 PUBLISH
Call-ID: 6d9f4e9f51f06fc907093d461a84fffe
Contact: <sip:7040@192.168.244.100>
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Event: vq-rtcpxr
User-Agent: PolycomVVX-VVX_600-UA/5.5.2.8571_0004f284fffe
Accept-Language: en
Max-Forwards: 70
Expires: 3600
Content-Type: application/vq-rtcpxr
Content-Length: 804

VQSessionReport: CallTerm
CallID:34cebe96350f796b6d3088c2f684fffe
LocalID: "7040" <sip:7040@mypbx.ca>
RemoteID: <sip:+442035982801@mypbx.ca>
OrigID: "7040" <sip:7040@mypbx.ca>
LocalGroup: Victoria Office
LocalAddr:IP= 192.168.244.100 PORT=2242 SSRC=3129383056
LocalMAC: 0004f284fffe
RemoteAddr:IP= 12.34.62.153 PORT=17538 SSRC=1266716344
LocalMetrics:
TimeStamps:START=2017-10-24T19:03:40Z STOP=2017-10-24T19:04:03Z
SessionDesc:PT=9 PPS=50 SSUP=off
JitterBuffer:JBA=3 JBR=5 JBN=150 JBM=150 JBX=160
PacketLoss:NLR=0.0 JDR=0.0
BurstGapLoss:BLD=54.2 BD=110 GLD=0.0 GD=11000 GMIN=16
Delay:RTD=68 ESD=169 OWD=203 IAJ=5
Signal:RERL=127
QualityEst:RLQ=93 RCQ=91 MOSLQ=3.8 MOSCQ=3.7
DialogID:34cebe96350f796b6d3088c2f684fffe;to-tag=as6e03af3b;from-tag=A4A51024-99CB92AD

/etc/captagent/protocol_rtcpxr.xml and /etc/captagent/captureplans/rtcpxr_capture_plan.cfg are both default. Please advise if I can provide any more detail. Setting debug to 128 didn't seem to help with more verbose logging.

lmangani commented 7 years ago

Hi @miken32 thanks for the details we're investigating this case - do you have a coredump for the original failure? removing the other modules will make things even more complex to truobleshoot.

miken32 commented 7 years ago
Oct 24 18:51:17 marceline kernel: captagent[20897]: segfault at 7f85922cd848 ip 00007f85920c79aa sp 00007f85920bfa50 error 4 in socket_collector.so[7f85920c5000+4000]

core.20893.zip

kYroL01 commented 7 years ago

@miken32 try after this https://github.com/sipcapture/captagent/commit/5333ddf76f21a153d71131f71ed1793180f784cf

miken32 commented 7 years ago

Still the same result using new config files (after git pull and recompile as well.) Attached is latest core dump.

segfault at 7fc2dcbc98b0 ip 00007fc2dc9c19aa sp 00007fc2dc9b9a50 error 4 in socket_collector.so[7fc2dc9bf000+4000]

core.23909.zip

lmangani commented 7 years ago

Let's see your collector capture plan please, also if possible do capture the packets reaching the agent socket and causing the crash for investigation.

kYroL01 commented 7 years ago

@miken32 the core you sent is good for your compiled captagent, anyway I just see there is the SigFault but not the backtrace from gdb:

0 0x00007fc2dc9c19aa in ?? ()

1 0x00007fc2dcbc3180 in ?? ()

2 0x00007fc2dc9babb0 in ?? ()

3 0x00007fc2d40008c0 in ?? ()

4 0x0000000000010000 in ?? ()

5 0x00007fc2d40008c0 in ?? ()

6 0x0000000000000000 in ?? ()

Can u send me a pcap to reproduce the sigfault or/and show me the backtrace of your gdb ? You can also send me (by mail) your binary captagent file. Thank you

miken32 commented 7 years ago

Apologies, this is all pretty foreign to me (I'm mostly a system administrator, with very little experience with this debugging stuff!)

So, I just set CFLAGS=-g before compile, ran debuginfo-install for the relevant libraries, and got this output from gdb:

#0  ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, 
    loc=0x3f9818ee40) at ../stdlib/strtol_l.c:298
#1  0x00007f0525bc59c3 in atoi (handle=<value optimized out>, nread=<value optimized out>, 
    rcvbuf=..., addr=0x7f0525bbebc0, flags=<value optimized out>) at /usr/include/stdlib.h:286
#2  on_recv (handle=<value optimized out>, nread=<value optimized out>, rcvbuf=..., 
    addr=0x7f0525bbebc0, flags=<value optimized out>) at socket_collector.c:244
#3  0x00007f0527e017d3 in uv__udp_recvmsg (loop=0x1ce2d30, w=0x7f0525dc6a90, revents=1)
    at ../src/unix/udp.c:242
#4  uv__udp_io (loop=0x1ce2d30, w=0x7f0525dc6a90, revents=1) at ../src/unix/udp.c:179
#5  0x00007f0527e027a4 in uv__io_poll (loop=0x1ce2d30, timeout=-1)
    at ../src/unix/linux-core.c:308
#6  0x00007f0527df72cc in uv_run (loop=0x1ce2d30, mode=<value optimized out>)
    at ../src/unix/core.c:317
#7  0x00007f0527df60ef in uv__thread_start (ctx_v=<value optimized out>)
    at ../src/uv-common.c:323
#8  0x0000003f98607aa1 in start_thread (arg=0x7f0525bc2700) at pthread_create.c:301
#9  0x0000003f97ee8bcd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Core dump, binary, and pcap are attached: debug.zip

kYroL01 commented 7 years ago

@miken32 Don't worry :) Thanks a lot, I'll check asap

miken32 commented 7 years ago

Oh, and as mentioned earlier, I'm using the default capture plan:

capture[collector] {

    # here we can check source/destination IP/port, message size
    if(msg_check("size", "10")) {

        # check if pkt is rtcp-xr
        if(is_rtcpxr()) {

        # if yes, parse the field and make a json output
             if(parse_rtcpxr_to_json()) {

            if(!send_hep("hepsocket")) {
                 clog("ERROR", "Error sending !!!!");
            }
             } else {
                    clog("ERROR", "couldn't parse RTCP-XR to json");
             }
        } else {
            clog("ERROR", "This is not RTCP-RX");
        }

        # Do parsing
        if(parse_full_sip()) {

            #check if our methos is PUBLISH
            if(sip_is_method() && sip_check("method","PUBLISH")) {

                # Currently we send reply automaticaly
                # send_rtcpxr_reply("200", "OK");

                # Can be defined many profiles in transport_hep.xml

                if(!send_hep_proto("hepsocket", "99")) {
                clog("ERROR", "Error sending HEP!!!!");
                }

            } else {
                send_reply("503", "Server internal error");
            }
        }
    }
    drop;
}
adubovikov commented 7 years ago

just to be sure, are you using the latest git ?

On Oct 26, 2017 12:34, "Michael Newton" notifications@github.com wrote:

Oh, and as mentioned earlier, I'm using the default capture plan:

capture[collector] {

here we can check source/destination IP/port, message size

if(msg_check("size", "10")) {

  # check if pkt is rtcp-xr
  if(is_rtcpxr()) {

  # if yes, parse the field and make a json output
       if(parse_rtcpxr_to_json()) {

      if(!send_hep("hepsocket")) {
           clog("ERROR", "Error sending !!!!");
      }
       } else {
              clog("ERROR", "couldn't parse RTCP-XR to json");
       }
  } else {
      clog("ERROR", "This is not RTCP-RX");
  }

  # Do parsing
  if(parse_full_sip()) {

      #check if our methos is PUBLISH
      if(sip_is_method() && sip_check("method","PUBLISH")) {

          # Currently we send reply automaticaly
          # send_rtcpxr_reply("200", "OK");

          # Can be defined many profiles in transport_hep.xml

          if(!send_hep_proto("hepsocket", "99")) {
          clog("ERROR", "Error sending HEP!!!!");
          }

      } else {
          send_reply("503", "Server internal error");
      }
  }

} drop; }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sipcapture/captagent/issues/156#issuecomment-339724582, or mute the thread https://github.com/notifications/unsubscribe-auth/AETdJcWAVL9Wcm6j5Gzx0VzpPd-oPOdrks5swLSugaJpZM4QE-Lj .

miken32 commented 7 years ago

Yes, updated again this morning before recompiling and re-installing.

miken32 commented 7 years ago

Just to be sure, I set up my repo like this:

git clone https://github.com/sipcapture/homer.git
cd homer
git submodule init
git submodule update --init --recursive
git submodule foreach git pull origin master

And I am pulling updates by running the last two commands again. This is correct?

lmangani commented 7 years ago

@miken32 I think we're talking about captagent, the HOMER installation method is a different deal and it's best to "split" the two for this troubleshooting. If you haven't done so, please pull and recompile captagent and try again.

miken32 commented 7 years ago

Yeah Homer links the captagent source as a sub module. I’ve confirmed the most recent changes are in my code.

kYroL01 commented 7 years ago

@miken32 can u try now ? https://github.com/sipcapture/captagent/commit/1368c0e8074cb959a6d716ff17073fa2f586ecbb

BTW in your zip file there is still only the bin

miken32 commented 7 years ago

Will give it a try in a couple of minutes. I was actually doing this very thing – LDEBUG("LOC_IDX in ON_RCV = %d\n", loc_idx); – a few seconds ago, and noticed I was getting values of 0 (which seemed good) and 39 (which seemed bad.)

miken32 commented 7 years ago

That seems to have worked. Now I get past the previous point in the code. Next step is to figure out what is wrong with Polycom's packets...

Oct 30 13:34:32 marceline captagent[12187]: [ERR] protocol_rtcpxr.c:127 Wrong version
Oct 30 13:34:32 marceline captagent[12187]: [ERR] protocol_sip.c:133 This is not RTCP-RX

(I suspect that should be RTCP-XR in the error message!)

lmangani commented 7 years ago

Thanks @miken32 any chance you can share a few example packets from the polycoms with us, also privately at support@sipcapture.org if preferred, so we can check them out too?

miken32 commented 7 years ago

No problem, here's a sample packet. Nothing changed except the PBX hostname, collector IP address, and phone MAC address:

PUBLISH sip:12.34.62.205:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.244.100;branch=z9hG4bKd165d3558B89CFE
From: "7040" <sip:7040@mypbx.ca>;tag=6B39152F-396775C0
To: <sip:12.34.62.205:5060>
CSeq: 1 PUBLISH
Call-ID: b6d4a078b6835eb3131a62bf2c84fffe
Contact: <sip:7040@192.168.244.100>
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Event: vq-rtcpxr
User-Agent: PolycomVVX-VVX_600-UA/5.5.2.8571_0004f284fffe
Accept-Language: en
Max-Forwards: 70
Expires: 3600
Content-Type: application/vq-rtcpxr
Content-Length: 789

VQSessionReport: CallTerm
CallID:ab647484a4aaea9e5e4266fc6084fffe
LocalID: "7040" <sip:7040@mypbx.ca>
RemoteID: <sip:6137451576@mypbx.ca>
OrigID: "7040" <sip:7040@mypbx.ca>
LocalGroup: Unknown
LocalAddr:IP= 192.168.244.100 PORT=2226 SSRC=3872460655
LocalMAC: 0004f284fffe
RemoteAddr:IP= 12.34.62.153 PORT=12140 SSRC=208925045
LocalMetrics:
TimeStamps:START=2017-10-30T17:50:01Z STOP=2017-10-30T17:50:11Z
SessionDesc:PT=9 PPS=50 SSUP=off
JitterBuffer:JBA=3 JBR=5 JBN=50 JBM=150 JBX=160
PacketLoss:NLR=0.0 JDR=6.2
BurstGapLoss:BLD=48.0 BD=1550 GLD=0.0 GD=5240 GMIN=16
Delay:RTD=66 ESD=65 OWD=98 IAJ=0
Signal:RERL=127
QualityEst:RLQ=79 RCQ=77 MOSLQ=3.3 MOSCQ=3.2
DialogID:ab647484a4aaea9e5e4266fc6084fffe;to-tag=as305083fd;from-tag=AE8EE81B-88492D5C

With RFC 6035 compliance disabled (uses "the existing draft implementation"):

PUBLISH sip:12.34.62.205:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.244.100;branch=z9hG4bKc57b7386867DB975
From: "7040" <sip:7040@mypbx.ca>;tag=3D016C9C-2A830963
To: <sip:12.34.62.205:5060>
CSeq: 1 PUBLISH
Call-ID: 68892a46cd6290ab018841c0c984fffe
Contact: <sip:7040@192.168.244.100>
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Event: vq-rtcpxr
User-Agent: PolycomVVX-VVX_600-UA/5.5.2.8571_0004f284fffe
Accept-Language: en
Max-Forwards: 70
Expires: 3600
Content-Type: application/vq-rtcpxr
Content-Length: 678

VQSessionReport
LocalMetrics:
TimeStamps:START=2017-10-30T17:43:50Z STOP=2017-10-30T17:43:59Z
SessionDesc:PT=9 PPS=50 SSUP=off
CallID:f2bf0b55a240c579cb27143b7884fffe
ToID:<sip:6137451576@mypbx.ca>
FromID:"7040" <sip:7040@mypbx.ca>
LocalAddr:IP=192.168.244.100 PORT=2226 SSRC=3677617454
RemoteAddr:IP=12.34.62.153 PORT=17628 SSRC=2011548029
JitterBuffer:JBA=3 JBR=5 JBN=30 JBM=40 JBX=160
PacketLoss:NLR=0.0 JDR=0.0
BurstGapLoss:BLD=0.0 BD=0 GLD=0.0 GD=9300 GMIN=16
Delay:RTD=68 ESD=49 OWD=83 IAJ=0
Signal:RERL=127
QualityEst:RLQ=94 RCQ=92 MOSLQ=3.8 MOSCQ=3.8
DialogID:f2bf0b55a240c579cb27143b7884fffe;to-tag=as05125183;from-tag=C9C1489A-102B2DD9

I get the same result in both modes. Let me know if you'd rather I open a separate ticket for this...

miken32 commented 7 years ago

And I did update my comment from 4 days ago with the zip file I meant to upload at that time. It contains output from tcpdump.

miken32 commented 7 years ago

It appears to be trying to parse it as a binary RTCP packet instead of plain text encapsulated in a SIP packet. Does RTCP XR come in two flavours? I think that might be the problem here.

lmangani commented 7 years ago

@miken32 yes indeed, yours is the PUBLISH type and should enter the following block by adding a debug line, possibly try move it ahead of the previous check for troubleshooting:

if(parse_full_sip()) {
    ...
lmangani commented 7 years ago

If you're able to provision your devices with a dedicated IP:PORT sending to the Agent, make sure you're using the vqmon socket instead of the raw one.

miken32 commented 7 years ago

Thanks for the assistance; I will investigate further, but I suspect this is a configuration issue on my side now!