processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.07k stars 1.51k forks source link

Issue Failed to establish outbound s2s connection domain -> user: DNS lookup failed: non-existing domain; bouncing for 174 seconds #2717

Closed brunopistone closed 5 years ago

brunopistone commented 5 years ago

Ejabberd version: 18.09

ejabberd.yml:

websocket_ping_interval: 30

websocket_timeout: 90

loglevel: 4

log_rotate_size: 50485760
log_rotate_date: "$D0"
log_rotate_count: 10

log_rate_limit: 100

hosts:
  - "wcs.test.biz"

certfiles:
  - "/opt/ejabberd-18.09/conf/server.pem"

ca_file: "/opt/ejabberd-18.09/conf/cacert.pem"

listen:
  -
    port: 5222
    ip: "::"
    module: ejabberd_c2s
    starttls: true
    access: s2s
  -
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      "/http-bind/ws": ejabberd_http_ws
      "/http-bind": mod_bosh
      "/oauth": ejabberd_oauth
      "/api": mod_http_api
    web_admin: false
    ## register: true
    captcha: false
  -
    port: 7070
    ip: "::"
    module: ejabberd_http
    request_handlers:
      "/http-bind/ws": ejabberd_http_ws
      "/http-bind": mod_bosh
    web_admin: false
    captcha: false

  -
    port: 9090
    ip: "::"
    module: ejabberd_http
    request_handlers:
      "/api": mod_http_api
    web_admin: true
    http_bind: true
    captcha: false

s2s_use_starttls: optional

auth_method: internal
resource_conflict: closeold

shaper:
  normal: 4000000
  fast: 8000000

max_fsm_queue: 10000

acl:
  admin:
    user:
      - "admin": "wcs.test.biz"
  local:
    user_regexp: ""
  loopback:
    ip:
      - "127.0.0.0/8"
      - "::1/128"
      - "::FFFF:127.0.0.1/128"

shaper_rules:
  max_user_sessions: 10
  max_user_offline_messages:
    - 5000: admin
    - 100
  c2s_shaper:
    - none: admin
    - normal
  s2s_shaper: fast

access_rules:
  local:
    - allow: local
  c2s:
    - deny: blocked
    - allow
  s2s:
    - deny: blocked
    - allow: all
  announce:
    - allow: admin
  configure:
    - allow: admin
  muc_create:
    - allow: local
  register:
    - allow
  trusted_network:
    - allow: loopback

api_permissions:
  "console commands":
    from:
      - ejabberd_ctl
    who: all
    what: "*"
  "admin access":
    who:
      - admin
      - users
    what:
      - "*"
  "public commands":
    who:
      - ip: "127.0.0.1/8"
    what:
      - "status"
      - "connected_users_number"

language: "en"

acme:

   contact: "mailto:admin@localhost"

   ca_url: "https://acme-v01.api.letsencrypt.org"

modules:
  mod_adhoc: {}
  mod_admin_extra: {}
  mod_announce: # recommends mod_adhoc
    access: announce
  mod_blocking: {} # requires mod_privacy
  mod_caps: {}
  mod_carboncopy: {}
  mod_configure: {} # requires mod_adhoc
  mod_bosh: 
    max_inactivity: 20
  mod_last: {}
  mod_muc:
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
    default_room_options:
      allow_subscription: true
  mod_muc_admin: {}
  ## mod_muc_log: {}
  ## mod_multicast: {}
  mod_offline:
    access_max_user_messages: max_user_offline_messages
  mod_ping: 
    send_pings: true
    ping_interval: 20
    ping_ack_timeout: 4
    timeout_action: kill
  mod_privacy: {}
  mod_private: {}
  mod_push: {}
  mod_push_keepalive: {}
  mod_register:
    welcome_message:
      subject: "Welcome!"
      body: |-
        Hi.
        Welcome to this XMPP server.
    ip_access: trusted_network
    access: register
  mod_roster: {}
  mod_shared_roster: {}
  ## mod_stats: {}
  ## mod_time: {}
  mod_vcard:
    search: false
  mod_vcard_xupdate: {}
  ## mod_avatar: {}
  mod_version: {}
  mod_stream_mgmt: {}
  mod_s2s_dialback: {}
  mod_http_api: {}
  mod_fail2ban: {}

allow_contrib_modules: true

Hi, I have the following problem. I have two clustered ejabberd server running on two different machine. The case is this: I have the user U logged in with a resource in the ejabberd server A.

I send the following message from machine B and received from the ejabberd server B:

2018-12-11 17:40:34.137 [info] <0.816.0>@mod_http_api:log:543 API call send_message [{<<"to">>,<<"agent1@wcs.test.biz/euphonia">>},{<<"body">>,<<"{\"message\":\"logout\",\"time\":\"instant\",\"minutes\":\"null\",\"reason\":\"logout\",\"subject\":\"logout e\",\"command\":\"system_message\"}">>},{<<"subject">>,<<"logout">>},{<<"from">>,<<"admin">>},{<<"type">>,<<"normal">>}] from ::ffff:XXX.XXX.X.X:45430```

In order to logout the user **U** from ejabberd server **A**. 
On the ejabberd server **A** I receive the following log message:

2018-12-11 17:36:11.564 [warning] <0.650.0>@ejabberd_s2s_out:process_closed:160 Failed to establish outbound s2s connection wcs.test.biz -> admin: DNS lookup failed: non-existing domain; bouncing for 82 seconds



But on the server **A** nothing work as expected. 
Seems that communication between the two machines doesn't work. Can you help me? Thank you
zinid commented 5 years ago

Not a bug report.

lemon2015 commented 4 years ago

Not a bug report.

I also encountered this problem, but when I entered some abnormal rooms.

` 2019-12-09 22:20:24.858 [info] <0.28151.8117>@ejabberd_s2s_out:process_closed:161 Failed to establish outbound s2s connection im.project.com -> conference.im.project.com: DNS lookup failed: non-existing domain; bouncing for 198 seconds

`

When I exec any ejabberdctl command relate with the problem room, I will get the noproc response.

ejabberdctl get_room_options u2_225308_472552502100 conference.im.project.com Problem 'exit {**noproc**,{p1_fsm,sync_send_all_state_event, [<0.21954.4>,get_config]}}' occurred executing the command. Stacktrace: [{p1_fsm,sync_send_all_state_event,2, [{file,"src/p1_fsm.erl"},{line,253}]}, {mod_muc_admin,get_room_config,1, [{file,"src/mod_muc_admin.erl"},{line,749}]}, {mod_muc_admin,get_room_options,1, [{file,"src/mod_muc_admin.erl"},{line,1001}]}, {ejabberd_ctl,call_command,4, [{file,"src/ejabberd_ctl.erl"},{line,352}]}, {ejabberd_ctl,try_call_command,4, [{file,"src/ejabberd_ctl.erl"},{line,321}]}, {ejabberd_ctl,process2,4, [{file,"src/ejabberd_ctl.erl"},{line,271}]}, {ejabberd_ctl,process,2, [{file,"src/ejabberd_ctl.erl"},{line,249}]}, {rpc,'-handle_call_call/6-fun-0-',5, [{file,"rpc.erl"},{line,197}]}]

So, what should I do? restart the server?

zinid commented 4 years ago

@lemon2015 seems like you're typing in a wrong domain name: conference.im.project.com is not a host served by ejabberd.

lemon2015 commented 4 years ago

@lemon2015 seems like you're typing in a wrong domain name: conference.im.project.com is not a host served by ejabberd.

Glad to see you , the room service of ejabberd is conference.@HOST@.

mod_muc:
    db_type: sql
    host: "**conference.@HOST@**"
    max_user_conferences: 50
    history_size: 20
    default_room_options:
      mam: true
      ##logging: true
      persistent: false
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
  mod_muc_admin: {}

The normal room is ok. just like :

[root@11.22.33.44-ejabberd conf]# ejabberdctl get_room_options u2_mark-test conference.im.project.com
title
description
allow_change_subj       true
allow_query_users       true
allow_private_messages  true
allow_private_messages_from_visitors    anyone
allow_visitor_status    true
allow_visitor_nickchange        true
public  true
public_list     true
persistent      false
moderated       true
captcha_protected       false
members_by_default      true
members_only    false
allow_user_invites      false
allow_subscription      false
password_protected      false
password
anonymous       true
presence_broadcast      [moderator,participant,visitor]
allow_voice_requests    true
voice_request_min_interval      1800
max_users       200
logging false
vcard
captcha_whitelist       {0,nil}
mam     true
pubsub

only some bad rooms is not ok. I don't understand what the user has done, resulting in the failure of this room, similar to blacklist. In addition, I wonder if there is such a list of invalid rooms recorded somewhere, such as mnesia?

zinid commented 4 years ago

Well then that crash means most likely the room doesn't exist: it's just the command doesn't process this situation correctly and crashes.

Regarding im.bangwo8.com -> conference.im.project.com: DNS lookup failed - it's quite weird. Probably you have something in crash.log? Seems like your MUC service crashes from time to time if you see this message often.

lemon2015 commented 4 years ago

Well then that crash means most likely the room doesn't exist: it's just the command doesn't process this situation correctly and crashes.

Regarding im.bangwo8.com -> conference.im.project.com: DNS lookup failed - it's quite weird. Probably you have something in crash.log? Seems like your MUC service crashes from time to time if you see this message often.

That's not true. There's only a little in the crash log.

2019-12-10 00:57:26 =ERROR REPORT====
** State machine <0.2430.3276> terminating
** Last message in was {'$gen_sync_all_state_event',{<24422.694.0>,#Ref<24422.2822972389.351272961.222273>},{get_disco_item,only_non_empty,{jid,<<>>,<<"multicast.im.project.com">>,<<>>,<<>>,<<"multicast.im.project.com">>,<<>>},<<>>}}
** When State == normal_state
**      Data  == stop
** Reason for termination =
** {{badrecord,state},[{mod_muc_room,get_users_and_subscribers,1,[{file,"src/mod_muc_room.erl"},{line,1119}]},{mod_muc_room,terminate,3,[{file,"src/mod_muc_room.erl"},{line,711}]},{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,740}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
2019-12-10 00:57:26 =CRASH REPORT====
crasher:
initial call: mod_muc_room:init/1
pid: <0.2430.3276>
registered_name: []
exception exit: {{{badrecord,state},[{mod_muc_room,get_users_and_subscribers,1,[{file,"src/mod_muc_room.erl"},{line,1119}]},{mod_muc_room,terminate,3,[{file,"src/mod_muc_room.erl"},{line,711}]},{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,740}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]},[{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,743}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
ancestors: ['mod_muc_im.project.com',ejabberd_gen_mod_sup,ejabberd_sup,<0.66.0>]
message_queue_len: 0
messages: []
links: []
dictionary: [{'$internal_queue_len',0},{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,next => #Fun<rand.5.15449617>,type => exrop,uniform => #Fun<rand.6.15449617>,uniform_n => #Fun<rand.7.15449617>,weak_low_bits => 1},[204573968280098656|280722831702036086]}}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 137884
    neighbours:
    2019-12-10 11:30:26 =ERROR REPORT====
    unexpected event from XML driver: {xmlstreamelement,{xmlel,<<"message">>,[{<<"xmlns">>,<<"jabber:client">>},{<<"type">>,<<"groupchat">>},{<<"id">>,<<"first@1575948501126">>},{<<"from">>,<<"u1_146489_213509147@im.project.com">>},{<<"to">>,<<"u1_146489_213509147@conference.im.project.com">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<229,174,137,232,163,133,231,168,142,230,142,167,231,155,152>>}]}]}}; xmlstreamstart was expected2019-12-06 09:06:42 =CRASH REPORT====
    crasher:
    initial call: ejabberd_listener:init/3
    pid: <0.12952.3337>
    registered_name: []
    exception error: {{badmatch,ignore},[{xmpp_socket,start,4,[{file,"src/xmpp_socket.erl"},{line,108}]},{ejabberd_listener,accept,4,[{file,"src/ejabberd_listener.erl"},{line,297}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    ancestors: [ejabberd_listener,ejabberd_sup,<0.66.0>]
    message_queue_len: 0
    messages: []
    [root@115.28.222.252-ejabberd logs]# cat crash.txt | more
    2019-12-10 00:57:26 =ERROR REPORT====
    ** State machine <0.2430.3276> terminating
    ** Last message in was {'$gen_sync_all_state_event',{<24422.694.0>,#Ref<24422.2822972389.351272961.222273>},{get_disco_item,onl
    y_non_empty,{jid,<<>>,<<"multicast.im.project.com">>,<<>>,<<>>,<<"multicast.im.project.com">>,<<>>},<<>>}}
    ** When State == normal_state
    **      Data  == stop
    ** Reason for termination =
    ** {{badrecord,state},[{mod_muc_room,get_users_and_subscribers,1,[{file,"src/mod_muc_room.erl"},{line,1119}]},{mod_muc_room,ter
    minate,3,[{file,"src/mod_muc_room.erl"},{line,711}]},{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,740}]},{proc_lib,init_p
    _do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
    2019-12-10 00:57:26 =CRASH REPORT====  crasher:
    initial call: mod_muc_room:init/1    pid: <0.2430.3276>    registered_name: []
    exception exit: {{{badrecord,state},[{mod_muc_room,get_users_and_subscribers,1,[{file,"src/mod_muc_room.erl"},{line,1119}]}
    ,{mod_muc_room,terminate,3,[{file,"src/mod_muc_room.erl"},{line,711}]},{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,740}]
    },{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]},[{p1_fsm,terminate,8,[{file,"src/p1_fsm.erl"},{line,743}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}    ancestors: ['mod_muc_im.project.com',ejabberd_gen_mod_sup,ejabberd_sup,<0.66.0>]    message_queue_len: 0    messages: []
    links: []    dictionary: [{'$internal_queue_len',0},{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,next => #Fun<rand.5.15449617>,type => exrop,uniform => #Fun<rand.6.15449617>,uniform_n => #Fun<rand.7.15449617>,weak_low_bits => 1},[204573968280098656|280722831702036086]}}]
        trap_exit: true    status: running    heap_size: 987    stack_size: 27
        reductions: 137884
        neighbours:
        2019-12-10 11:30:26 =ERROR REPORT====
        unexpected event from XML driver: {xmlstreamelement,{xmlel,<<"message">>,[{<<"xmlns">>,<<"jabber:client">>},{<<"type">>,<<"grou
        pchat">>},{<<"id">>,<<"first@1575948501126">>},{<<"from">>,<<"u1_146489_213509147@im.project.com">>},{<<"to">>,<<"u1_146489_213
        509147@conference.im.project.com">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<229,174,137,232,163,133,231,168,142,230,142,167,231,15
        5,152>>}]}]}}; xmlstreamstart was expected2019-12-06 09:06:42 =CRASH REPORT====
        crasher:
        initial call: ejabberd_listener:init/3
        pid: <0.12952.3337>
        registered_name: []
        exception error: {{badmatch,ignore},[{xmpp_socket,start,4,[{file,"src/xmpp_socket.erl"},{line,108}]},{ejabberd_listener,acc
        ept,4,[{file,"src/ejabberd_listener.erl"},{line,297}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
        ancestors: [ejabberd_listener,ejabberd_sup,<0.66.0>]
        message_queue_len: 0
        messages: []
        links: [#Port<0.108688595>,<0.561.0>,#Port<0.108326095>]
        dictionary: []
        trap_exit: false
        status: running
        heap_size: 2586
        stack_size: 27
        reductions: 72411746
        neighbours:
        2019-12-06 09:06:42 =SUPERVISOR REPORT====
        Supervisor: {local,ejabberd_listener}
        Context:    child_terminated
        Reason:     {{badmatch,ignore},[{xmpp_socket,start,4,[{file,"src/xmpp_socket.erl"},{line,108}]},{ejabberd_listener,accept,
        4,[{file,"src/ejabberd_listener.erl"},{line,297}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
        Offender:   [{pid,<0.12952.3337>},{id,{7222,{0,0,0,0},tcp}},{mfargs,{ejabberd_listener,start,[{7222,{0,0,0,0},tcp},ejabber
        d_c2s,[{access,c2s},{shaper,c2s_shaper},{max_stanza_size,262144},{protocol_options,[<<"no_sslv3">>]},{starttls,true}]]}},{resta
        rt_type,transient},{shutdown,brutal_kill},{child_type,worker}]

        2019-12-06 16:52:11 =ERROR REPORT====
        unexpected event from XML driver: {xmlstreamelement,{xmlel,<<"message">>,[{<<"xmlns">>,<<"jabber:client">>},{<<"type">>,<<"grou
        pchat">>},{<<"id">>,<<"u2_18_696617244001@1575622332491">>},{<<"from">>,<<"u2_18_696617244001@im.project.com">>},{<<"to">>,<<"u
        2_18_696617244001@conference.im.project.com">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<"<img class=\"project_oss_upload\" style=\"max-
                                                                                                  width:182px;\" src=\"https://alioss.project.com/18/Xft87ADwMJ.jpg\">">>}]}]}}; xmlstreamstart was expected2019-12-05 10:51:03 =
        ERROR REPORT====
        unexpected event from XML driver: {xmlstreamelement,{xmlel,<<"message">>,[{<<"xmlns">>,<<"jabber:client">>},{<<"type">>,<<"grou
        pchat">>},{<<"id">>,<<"first@1575514263929">>},{<<"from">>,<<"u1_146489_256179671@im.project.com">>},{<<"to">>,<<"undefined@con
        ference.im.project.com">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<229,139,190,233,128,137,233,151,174,233,162,152>>}]}]}}; xmlstre
        amstart was expected2019-12-05 15:30:07 =ERROR REPORT====
        unexpected event from XML driver: {xmlstreamelement,{xmlel,<<"message">>,[{<<"xmlns">>,<<"jabber:client">>},{<<"type">>,<<"grou
        pchat">>},{<<"id">>,<<"first@1575531023318">>},{<<"from">>,<<"u2_18_959667086576@im.project.com">>},{<<"to">>,<<"undefined@conf
        erence.im.project.com">>}],[{xmlel,<<"body">>,[],[{xmlcdata,<<"<img class=\"project_oss_upload\" style=\"max-width:182px;\" src=\"h
                                                                            ttps://alioss.project.com/18/CMam4D3p8E.jpg\">">>}]}]}}; xmlstreamstart was expected
zinid commented 4 years ago

What ejabberd version are you using? Seems like all those bugs were fixed already.

zinid commented 4 years ago

Also, disable mod_multicast.

lemon2015 commented 4 years ago

What ejabberd version are you using? Seems like all those bugs were fixed already.

18.01 ,Is the version bug ? It seems that the crash log has nothing to do with this problem.

zinid commented 4 years ago

Well, I suggest to upgrade the server. Formally speaking, we only accept bug reports from the most recent version only, unless it's obvious that the bug from older version still exists in the current version, which is not the case - what you see in ejabberd.log are most likely side effects of the crashes.

lemon2015 commented 4 years ago

Well, I suggest to upgrade the server. Formally speaking, we only accept bug reports from the most recent version only, unless it's obvious that the bug from older version still exists in the current version, which is not the case - what you see in ejabberd.log are most likely side effects of the crashes.

Thanks a lot.