Closed miken32 closed 7 years ago
Hi @miken32 thanks for the details we're investigating this case - do you have a coredump for the original failure? removing the other modules will make things even more complex to truobleshoot.
Oct 24 18:51:17 marceline kernel: captagent[20897]: segfault at 7f85922cd848 ip 00007f85920c79aa sp 00007f85920bfa50 error 4 in socket_collector.so[7f85920c5000+4000]
@miken32 try after this https://github.com/sipcapture/captagent/commit/5333ddf76f21a153d71131f71ed1793180f784cf
Still the same result using new config files (after git pull and recompile as well.) Attached is latest core dump.
segfault at 7fc2dcbc98b0 ip 00007fc2dc9c19aa sp 00007fc2dc9b9a50 error 4 in socket_collector.so[7fc2dc9bf000+4000]
Let's see your collector capture plan please, also if possible do capture the packets reaching the agent socket and causing the crash for investigation.
@miken32 the core you sent is good for your compiled captagent, anyway I just see there is the SigFault but not the backtrace from gdb:
0 0x00007fc2dc9c19aa in ?? ()
1 0x00007fc2dcbc3180 in ?? ()
2 0x00007fc2dc9babb0 in ?? ()
3 0x00007fc2d40008c0 in ?? ()
4 0x0000000000010000 in ?? ()
5 0x00007fc2d40008c0 in ?? ()
6 0x0000000000000000 in ?? ()
Can u send me a pcap to reproduce the sigfault or/and show me the backtrace of your gdb ? You can also send me (by mail) your binary captagent file. Thank you
Apologies, this is all pretty foreign to me (I'm mostly a system administrator, with very little experience with this debugging stuff!)
So, I just set CFLAGS=-g
before compile, ran debuginfo-install
for the relevant libraries, and got this output from gdb:
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>,
loc=0x3f9818ee40) at ../stdlib/strtol_l.c:298
#1 0x00007f0525bc59c3 in atoi (handle=<value optimized out>, nread=<value optimized out>,
rcvbuf=..., addr=0x7f0525bbebc0, flags=<value optimized out>) at /usr/include/stdlib.h:286
#2 on_recv (handle=<value optimized out>, nread=<value optimized out>, rcvbuf=...,
addr=0x7f0525bbebc0, flags=<value optimized out>) at socket_collector.c:244
#3 0x00007f0527e017d3 in uv__udp_recvmsg (loop=0x1ce2d30, w=0x7f0525dc6a90, revents=1)
at ../src/unix/udp.c:242
#4 uv__udp_io (loop=0x1ce2d30, w=0x7f0525dc6a90, revents=1) at ../src/unix/udp.c:179
#5 0x00007f0527e027a4 in uv__io_poll (loop=0x1ce2d30, timeout=-1)
at ../src/unix/linux-core.c:308
#6 0x00007f0527df72cc in uv_run (loop=0x1ce2d30, mode=<value optimized out>)
at ../src/unix/core.c:317
#7 0x00007f0527df60ef in uv__thread_start (ctx_v=<value optimized out>)
at ../src/uv-common.c:323
#8 0x0000003f98607aa1 in start_thread (arg=0x7f0525bc2700) at pthread_create.c:301
#9 0x0000003f97ee8bcd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Core dump, binary, and pcap are attached: debug.zip
@miken32 Don't worry :) Thanks a lot, I'll check asap
Oh, and as mentioned earlier, I'm using the default capture plan:
capture[collector] {
# here we can check source/destination IP/port, message size
if(msg_check("size", "10")) {
# check if pkt is rtcp-xr
if(is_rtcpxr()) {
# if yes, parse the field and make a json output
if(parse_rtcpxr_to_json()) {
if(!send_hep("hepsocket")) {
clog("ERROR", "Error sending !!!!");
}
} else {
clog("ERROR", "couldn't parse RTCP-XR to json");
}
} else {
clog("ERROR", "This is not RTCP-RX");
}
# Do parsing
if(parse_full_sip()) {
#check if our methos is PUBLISH
if(sip_is_method() && sip_check("method","PUBLISH")) {
# Currently we send reply automaticaly
# send_rtcpxr_reply("200", "OK");
# Can be defined many profiles in transport_hep.xml
if(!send_hep_proto("hepsocket", "99")) {
clog("ERROR", "Error sending HEP!!!!");
}
} else {
send_reply("503", "Server internal error");
}
}
}
drop;
}
just to be sure, are you using the latest git ?
On Oct 26, 2017 12:34, "Michael Newton" notifications@github.com wrote:
Oh, and as mentioned earlier, I'm using the default capture plan:
capture[collector] {
here we can check source/destination IP/port, message size
if(msg_check("size", "10")) {
# check if pkt is rtcp-xr if(is_rtcpxr()) { # if yes, parse the field and make a json output if(parse_rtcpxr_to_json()) { if(!send_hep("hepsocket")) { clog("ERROR", "Error sending !!!!"); } } else { clog("ERROR", "couldn't parse RTCP-XR to json"); } } else { clog("ERROR", "This is not RTCP-RX"); } # Do parsing if(parse_full_sip()) { #check if our methos is PUBLISH if(sip_is_method() && sip_check("method","PUBLISH")) { # Currently we send reply automaticaly # send_rtcpxr_reply("200", "OK"); # Can be defined many profiles in transport_hep.xml if(!send_hep_proto("hepsocket", "99")) { clog("ERROR", "Error sending HEP!!!!"); } } else { send_reply("503", "Server internal error"); } }
} drop; }
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sipcapture/captagent/issues/156#issuecomment-339724582, or mute the thread https://github.com/notifications/unsubscribe-auth/AETdJcWAVL9Wcm6j5Gzx0VzpPd-oPOdrks5swLSugaJpZM4QE-Lj .
Yes, updated again this morning before recompiling and re-installing.
Just to be sure, I set up my repo like this:
git clone https://github.com/sipcapture/homer.git
cd homer
git submodule init
git submodule update --init --recursive
git submodule foreach git pull origin master
And I am pulling updates by running the last two commands again. This is correct?
@miken32 I think we're talking about captagent, the HOMER installation method is a different deal and it's best to "split" the two for this troubleshooting. If you haven't done so, please pull and recompile captagent and try again.
Yeah Homer links the captagent source as a sub module. I’ve confirmed the most recent changes are in my code.
@miken32 can u try now ? https://github.com/sipcapture/captagent/commit/1368c0e8074cb959a6d716ff17073fa2f586ecbb
BTW in your zip file there is still only the bin
Will give it a try in a couple of minutes. I was actually doing this very thing – LDEBUG("LOC_IDX in ON_RCV = %d\n", loc_idx);
– a few seconds ago, and noticed I was getting values of 0 (which seemed good) and 39 (which seemed bad.)
That seems to have worked. Now I get past the previous point in the code. Next step is to figure out what is wrong with Polycom's packets...
Oct 30 13:34:32 marceline captagent[12187]: [ERR] protocol_rtcpxr.c:127 Wrong version
Oct 30 13:34:32 marceline captagent[12187]: [ERR] protocol_sip.c:133 This is not RTCP-RX
(I suspect that should be RTCP-XR in the error message!)
Thanks @miken32 any chance you can share a few example packets from the polycoms with us, also privately at support@sipcapture.org if preferred, so we can check them out too?
No problem, here's a sample packet. Nothing changed except the PBX hostname, collector IP address, and phone MAC address:
PUBLISH sip:12.34.62.205:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.244.100;branch=z9hG4bKd165d3558B89CFE
From: "7040" <sip:7040@mypbx.ca>;tag=6B39152F-396775C0
To: <sip:12.34.62.205:5060>
CSeq: 1 PUBLISH
Call-ID: b6d4a078b6835eb3131a62bf2c84fffe
Contact: <sip:7040@192.168.244.100>
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Event: vq-rtcpxr
User-Agent: PolycomVVX-VVX_600-UA/5.5.2.8571_0004f284fffe
Accept-Language: en
Max-Forwards: 70
Expires: 3600
Content-Type: application/vq-rtcpxr
Content-Length: 789
VQSessionReport: CallTerm
CallID:ab647484a4aaea9e5e4266fc6084fffe
LocalID: "7040" <sip:7040@mypbx.ca>
RemoteID: <sip:6137451576@mypbx.ca>
OrigID: "7040" <sip:7040@mypbx.ca>
LocalGroup: Unknown
LocalAddr:IP= 192.168.244.100 PORT=2226 SSRC=3872460655
LocalMAC: 0004f284fffe
RemoteAddr:IP= 12.34.62.153 PORT=12140 SSRC=208925045
LocalMetrics:
TimeStamps:START=2017-10-30T17:50:01Z STOP=2017-10-30T17:50:11Z
SessionDesc:PT=9 PPS=50 SSUP=off
JitterBuffer:JBA=3 JBR=5 JBN=50 JBM=150 JBX=160
PacketLoss:NLR=0.0 JDR=6.2
BurstGapLoss:BLD=48.0 BD=1550 GLD=0.0 GD=5240 GMIN=16
Delay:RTD=66 ESD=65 OWD=98 IAJ=0
Signal:RERL=127
QualityEst:RLQ=79 RCQ=77 MOSLQ=3.3 MOSCQ=3.2
DialogID:ab647484a4aaea9e5e4266fc6084fffe;to-tag=as305083fd;from-tag=AE8EE81B-88492D5C
With RFC 6035 compliance disabled (uses "the existing draft implementation"):
PUBLISH sip:12.34.62.205:5060 SIP/2.0
Via: SIP/2.0/UDP 192.168.244.100;branch=z9hG4bKc57b7386867DB975
From: "7040" <sip:7040@mypbx.ca>;tag=3D016C9C-2A830963
To: <sip:12.34.62.205:5060>
CSeq: 1 PUBLISH
Call-ID: 68892a46cd6290ab018841c0c984fffe
Contact: <sip:7040@192.168.244.100>
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, INFO, MESSAGE, SUBSCRIBE, NOTIFY, PRACK, UPDATE, REFER
Event: vq-rtcpxr
User-Agent: PolycomVVX-VVX_600-UA/5.5.2.8571_0004f284fffe
Accept-Language: en
Max-Forwards: 70
Expires: 3600
Content-Type: application/vq-rtcpxr
Content-Length: 678
VQSessionReport
LocalMetrics:
TimeStamps:START=2017-10-30T17:43:50Z STOP=2017-10-30T17:43:59Z
SessionDesc:PT=9 PPS=50 SSUP=off
CallID:f2bf0b55a240c579cb27143b7884fffe
ToID:<sip:6137451576@mypbx.ca>
FromID:"7040" <sip:7040@mypbx.ca>
LocalAddr:IP=192.168.244.100 PORT=2226 SSRC=3677617454
RemoteAddr:IP=12.34.62.153 PORT=17628 SSRC=2011548029
JitterBuffer:JBA=3 JBR=5 JBN=30 JBM=40 JBX=160
PacketLoss:NLR=0.0 JDR=0.0
BurstGapLoss:BLD=0.0 BD=0 GLD=0.0 GD=9300 GMIN=16
Delay:RTD=68 ESD=49 OWD=83 IAJ=0
Signal:RERL=127
QualityEst:RLQ=94 RCQ=92 MOSLQ=3.8 MOSCQ=3.8
DialogID:f2bf0b55a240c579cb27143b7884fffe;to-tag=as05125183;from-tag=C9C1489A-102B2DD9
I get the same result in both modes. Let me know if you'd rather I open a separate ticket for this...
And I did update my comment from 4 days ago with the zip file I meant to upload at that time. It contains output from tcpdump
.
It appears to be trying to parse it as a binary RTCP packet instead of plain text encapsulated in a SIP packet. Does RTCP XR come in two flavours? I think that might be the problem here.
@miken32 yes indeed, yours is the PUBLISH type and should enter the following block by adding a debug line, possibly try move it ahead of the previous check for troubleshooting:
if(parse_full_sip()) {
...
If you're able to provision your devices with a dedicated IP:PORT sending to the Agent, make sure you're using the vqmon socket instead of the raw one.
Thanks for the assistance; I will investigate further, but I suspect this is a configuration issue on my side now!
Trying to set up this software for the first time; I think I've got everything set up properly, but I get a segfault as soon as a RTCP-XR report comes in.
Running on Scientific Linux 6.9.
/usr/local/captagent/etc/captagent/captagent.xml
/usr/local/captagent/etc/captagent/socket_collector.xml
Typical RTCP-XR call report:
/etc/captagent/protocol_rtcpxr.xml
and/etc/captagent/captureplans/rtcpxr_capture_plan.cfg
are both default. Please advise if I can provide any more detail. Settingdebug
to 128 didn't seem to help with more verbose logging.