signalwire / freeswitch

FreeSWITCH is a Software Defined Telecom Stack enabling the digital transformation from proprietary telecom switches to a versatile software implementation that runs on any commodity hardware. From a Raspberry PI to a multi-core server, FreeSWITCH can unlock the telecommunications potential of any device.
https://freeswitch.com/#getting-started
Other
3.51k stars 1.41k forks source link

SIP Registration and ping fails after "REFUSED" DNS response #870

Open insiac opened 3 years ago

insiac commented 3 years ago

OS: Debian GNU/Linux 10 (buster) FS version: FreeSWITCH Version 1.10.5-release-17-25569c1631~64bit (-release-17-25569c1631 64bit)

We have two DNS server configured in the /etc/resolve.conf file (10.1.0.[20,21]) When one of the servers (10.1.0.21) was under maintenance, it ended up sending "REFUSED" response back to FS when it was trying to query for the gateway hostname (sipprovider.com). DNS reply:

Internet Protocol Version 4, Src: 10.1.0.21, Dst: 10.2.0.13
User Datagram Protocol, Src Port: 53, Dst Port: 60790
Domain Name System (response)
    Transaction ID: 0x4514
    Flags: 0x8105 Standard query response, Refused
        1... .... .... .... = Response: Message is a response
        .000 0... .... .... = Opcode: Standard query (0)
        .... .0.. .... .... = Authoritative: Server is not an authority for domain
        .... ..0. .... .... = Truncated: Message is not truncated
        .... ...1 .... .... = Recursion desired: Do query recursively
        .... .... 0... .... = Recursion available: Server can't do recursive queries
        .... .... .0.. .... = Z: reserved (0)
        .... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
        .... .... ...0 .... = Non-authenticated data: Unacceptable
        .... .... .... 0101 = Reply code: Refused (5)
    Questions: 1
    Answer RRs: 0
    Authority RRs: 0
    Additional RRs: 0
    Queries
        sipprivider.com: type A, class IN
            Name: sipprivider.com
            [Name Length: 18]
            [Label Count: 3]
            Type: A (Host Address) (1)
            Class: IN (0x0001)
    [Request In: 3]
    [Time: 0.000580000 seconds]

It looks like the error is noted in the debug output, but query to the alternate server is not reattempted:

sres.c:2988 sres_query_report_error() sres(q=0x7fdb78028670): reporting error OK for A sipprovider.com

After this, SIP clients were not able to make calls through the gateway. sngrep showed that SIP REGISRATIONS and pings ( SIP OPTIONS) to the gateway stopped. 'sofia loglevel all 9' output from the even that caused the REJECTED DNS answer:

nua.c:370 nua_handle_magic() nua: nua_handle_magic: entering
nua.c:346 nua_handle_bind() nua: nua_handle_bind: entering
nua.c:925 nua_handle_destroy() nua: nua_handle_destroy: entering
nua_stack.c:529 nua_signal() nua(0x7fdb6802f370): sent signal r_destroy
nua_stack.c:569 nua_stack_signal() nua(0x7fdb6802f370): recv signal r_destroy
nta.c:4476 nta_leg_destroy() nta_leg_destroy((nil))
soa.c:356 soa_destroy() soa_destroy(static::0x7fdb7801b140) called
nta.c:1296 agent_timer() nta: timer set next to 4001 ms
nua_common.c:108 nh_create_handle() nua: nh_create_handle: entering
nua.c:346 nua_handle_bind() nua: nua_handle_bind: entering
nua.c:661 nua_options() nua: nua_options: entering
nua_stack.c:529 nua_signal() nua(0x7fdb68031620): sent signal r_options
nua_stack.c:569 nua_stack_signal() nua(0x7fdb68031620): recv signal r_options
nua_params.c:482 nua_stack_set_params() nua: nua_stack_set_params: entering
soa.c:280 soa_clone() soa_clone(static::0x7fdb78001b90, 0x7fdb78001390, 0x7fdb68031620) called
soa.c:403 soa_set_params() soa_set_params(static::0x7fdb7801b140, ...) called
soa.c:403 soa_set_params() soa_set_params(static::0x7fdb7801b140, ...) called
nta.c:2671 nta_tpn_by_url() nta: selecting scheme sip
sres_cache.c:272 sres_cache_get() sres_cache_get(0x7fdb780044a0, SRV, "_sip._udp.sipprovider.com.") called
sres_cache.c:318 sres_cache_get() sres_cache_get(0x7fdb780044a0, SRV, "_sip._udp.sipprovider.com.") returned 1 entries
nta.c:10626 outgoing_query_srv() nta: for "sipprovider.com" query "_sip._udp.sipprovider.com" SRV (cached)
sres_cache.c:272 sres_cache_get() sres_cache_get(0x7fdb780044a0, A, "sipprovider.com.") called
nta.c:10834 outgoing_query_a() nta: for "sipprovider.com" query "sipprovider.com" A
sres.c:968 sres_query() sres_query(0x7fdb780041c0, 0x7fdb78012e80, A, "sipprovider.com") called
sres.c:2731 sres_send_dns_query() sres_send_dns_query(0x7fdb780041c0, 0x7fdb78028670) called
sres.c:2820 sres_send_dns_query() sres_send_dns_query(0x7fdb780041c0, 0x7fdb78028670) id=5012 A sipprovider.com (to [10.1.0.20]:53)
sres.c:3017 sres_resolver_timer() sres_resolver_timer() called at 1600816237
sres.c:3061 sres_resend_dns_query() sres_resend_dns_query(0x7fdb780041c0, 0x7fdb78028670, timeout) called
sres.c:2731 sres_send_dns_query() sres_send_dns_query(0x7fdb780041c0, 0x7fdb78028670) called
sresolv.c:190 sres_sofia_update() sres_sofia_update(0x7fdb78005240, 56, -1)
sres.c:2820 sres_send_dns_query() sres_send_dns_query(0x7fdb780041c0, 0x7fdb78028670) id=5012 A sipprovider.com (to [10.1.0.21]:53)
sres.c:3468 sres_resolver_receive() sres_resolver_receive(0x7fdb780041c0, 56) called
sres.c:3573 sres_log_response() sres_resolver_receive(0x7fdb780041c0, 0x7fdb78028670) id=5012 (from [10.1.0.21]:53)
sres.c:2988 sres_query_report_error() sres(q=0x7fdb78028670): reporting error OK for A sipprovider.com
sres.c:3468 sres_resolver_receive() sres_resolver_receive(0x7fdb780041c0, 30) called
sres.c:3631 sres_decode_msg() sres_decode_msg: matching query for id=5012
sres.c:3573 sres_log_response() sres_resolver_receive(0x7fdb780041c0, (nil)) id=5012 (from [10.1.0.20]:53)
nta.c:9122 outgoing_timer_dk() nta: timer K fired, terminate OPTIONS (25853802)
nta.c:8811 outgoing_reclaim_queued() outgoing_reclaim_all((nil), (nil), 0x7fdb9aff3af0)
nta.c:8941 _nta_outgoing_timer() nta_outgoing_timer: 0/0 resent, 0/0 tout, 1/1 term, 1/2 free
nta.c:1289 agent_timer() nta: timer not set

Once the "failing" DNS server was removed from the resolve.conf FS reregisterd with the gateway, but the pings did not restart. After restarting FS, the pings to the gateway resumed.

cdosoftei commented 3 years ago
sres.c:2988 sres_query_report_error() sres(q=0x7fdb78028670): reporting error OK for A sipprovider.com

:point_up: this stands out, the DNS response is decoded properly but not accounted for as an error; https://github.com/freeswitch/sofia-sip/pull/29 should fix that and enable the desired behaviour.