sipcapture / heplify

Portable and Lightweight HEP Capture Agent for HOMER
https://sipcapture.org
GNU Affero General Public License v3.0
185 stars 65 forks source link

Unable to correlate RTCP with Asterisk and endpoint NAT=yes #188

Closed Demolish50 closed 4 years ago

Demolish50 commented 4 years ago

Debugging RTCP shows "DBG [rtcp] No correlationID" with heplify debug turned on.

Looking at the PCAP I can see that heplify is unable to correlate the RTCP stream due to the srcip being different in the SDP body. I learned this from "https://github.com/sipcapture/heplify/issues/76"

Here is the configuration.

I have the PCAP I can provide if there is a private way to provide it.

Not sure what else to do.

TheZLabs commented 4 years ago

Having the same issue with Genesys media servers. SIP data makes it fine. RTCP Fail count just continues to go up. Running debug indicates NocorrelationID, same as above. Would really love to get QoS data from Homer for all these calls. Not much point for me to use it if this part doesn't work. Everything else seems to work really well with this exception. I did the docker setup if it matters.

Demolish50 commented 4 years ago

Having the same issue with Genesys media servers. SIP data makes it fine. RTCP Fail count just continues to go up. Running debug indicates NocorrelationID, same as above. Would really love to get QoS data from Homer for all these calls. Not much point for me to use it if this part doesn't work. Everything else seems to work really well with this exception. I did the docker setup if it matters.

I'm able to get a lot more correlation with captagent, but not perfect. I get all the RTP streams from the PBX to the phones (bi-directional) correlated but none from the PBX to the SIP provider. Trying to see what to do about that now.

lmangani commented 4 years ago

To address this type of scenario, focus on the SDP portion of the calls missing the SIP/RTCP correlation. The answer is most likely NATting, Feel free to paste examples for suggestions

Demolish50 commented 4 years ago

Thanks for the reply. It certainly is a NAT issue, I'm 99% sure. I'm not not sure how captagent manages to figure it out.

I did manage to get captagent to sort out both sides. PBX to SIP provider and PBX to endpoints in this configuration, but I have another issue with captagent with RTCPXR and would rather use heplify client anyway.

I can't seem to get heplify client to do the same. I can get heplify to now correlate correctly from PBX to SIP trunk provider (no natting) but I am unable to get it to correlate from PBX to phones when the phones are behind NAT. Its obvious why, SDP shows an internal address and the RTP address is different.

What exactly would you like pasted? I'm trying to figure out how captagent does this coloration so I can offer a suggestion.

lmangani commented 4 years ago

HEPlify should do the same thing nat-enable in captagent does, so having some examples would help locate the potential issue.

Demolish50 commented 4 years ago

I can send you the capture I just did. Do you have a private way I can provide it?

I can see in the capture that the message body of the SDP contains "Connection Information (c): IN IP4 192.168.1.135"

While at the same time I see this in the debug output.

2020/08/20 21:52:55.388655 correlator.go:147: DBG [rtcp] No correlationID for srcIP=34.203.251.161, srcPort=19559, dstIP=redactedpublicIP, dstPort=16621, payload={"sender_information":{"ntp_timestamp_sec":3806949174,"ntp_timestamp_usec":3232310656,"rtp_timestamp":2763387128,"packets":29247,"octets":4679520},"ssrc":2035215409,"type":202,"report_count":1,"report_blocks":[{"source_ssrc":64651983,"fraction_lost":0,"packets_lost":10,"highest_seq_no":38173,"ia_jitter":23,"lsr":1865531259,"dlsr":327680}],"report_blocks_xr":{"type":0,"id":0,"fraction_lost":0,"fraction_discard":0,"burst_density":0,"gap_density":0,"burst_duration":0,"gap_duration":0,"round_trip_delay":0,"end_system_delay":0},"sdes_ssrc":2035215409}

Demolish50 commented 4 years ago

I've found some additional information and now have RTCP data from my PBX to my endpoints but not phones to PBX. I'm still working that out. Anyway, it seems consistent but only time will tell. I had to create a NAT policy for my RTP traffic on my firewall (the one in front of the phones) and disable source port remapping. I noticed in the capture that heplify didn't seem to be even attempt to see the RTCP packets that occurred on ports above 20,000. At least it appeared that way. Every RTCP packet that occurred below that, heplify at-least tried.

The previous debug above was unrelated and I was chasing my tell there. I still have the capture that shows the behavior if you want it, just need that private email or some way to send you a private dropbox link.

On to trying to figure out why I don't see XR packets in homer. The QOS tab doesn't show the data when the packets are XR. Heplify seems to be sending the XR packets (I'm seeing them in the RTCP debug and they are correlated), they just aren't showing up in homer.

Demolish50 commented 4 years ago

I guess this should probably go somewhere else and we can close this issue but I'll go ahead and share what else is going on.

I still can't get RTCP-XR to show up in homer. Heplify is delivering the XR data to the server that homer/loki/Prometheus is on. XR data shows up in loki and in Prometheus but in homer on the QoS tab it does not. I see the non XR packets from the pbx to the phone but the XR packets from the phone to the PBX just aren't there even though the data is there in loki and prom.

From loki logs you can clearly see the XR data here. It also shows up in heplify_rtcpxr_round_trip_delay. There are not RTCPFail messages from the heplify client either

2020-08-24 11:20:53 | {"sender_information":{"ntp_timestamp_sec":3807274852,"ntp_timestamp_usec":3220500000,"rtp_timestamp":26731408,"packets":33500,"octets":5360000},"ssrc":1862408774,"type":207,"report_count":1,"report_blocks":[{"source_ssrc":0,"fraction_lost":0,"packets_lost":34,"highest_seq_no":53972,"ia_jitter":0,"lsr":1734354463,"dlsr":323748}],"report_blocks_xr":{"type":7,"id":0,"fraction_lost":0,"fraction_discard":0,"burst_density":0,"gap_density":0,"burst_duration":180,"gap_duration":3350,"round_trip_delay":34,"end_system_delay":47},"sdes_ssrc":1862408774} src_ip=**phonepublicIP** dst_ip=**PBXpublicIP** id=26cf682239ea60c83aec47a04ae21c21@pbxpublicip:5060
Demolish50 commented 4 years ago

I've verified that the data actually exists in the homer_data DB, in both directions.

Just one is missing from the QoS tab, no idea why but I guess that's a homer thing so I'll close this.

In case anyone else finds this: https://github.com/sipcapture/homer/issues/412