processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6k stars 1.5k forks source link

Error after disabling and enabling back mod_fail2ban #4211

Closed logicwonder closed 1 month ago

logicwonder commented 2 months ago

Environment

Errors from error.log/crash.log

``` erlang 2024-04-24 19:30:03.482608+05:30 [error] <0.19407.4730>@ejabberd_hooks:safe_apply/4:324 Hook c2s_auth_result crashed when running mod_fail2ban:c2s_auth_result/3: ** exception error: bad argument in function ets:delete/2 called as ets:delete(failed_auth,{0,0,0,0,0,65535,46268,60559}) *** argument 1: the table identifier does not refer to an existing ETS table in call from mod_fail2ban:c2s_auth_result/3 (src/mod_fail2ban.erl, line 85) in call from ejabberd_hooks:safe_apply/4 (src/ejabberd_hooks.erl, line 320) in call from ejabberd_hooks:run_fold1/4 (src/ejabberd_hooks.erl, line 301) in call from xmpp_stream_in:process_sasl_success/3 (src/xmpp_stream_in.erl, line 936) in call from xmpp_stream_in:handle_info/2 (src/xmpp_stream_in.erl, line 411) in call from p1_server:handle_msg/8 (src/p1_server.erl, line 696) in call from proc_lib:init_p_do_apply/3 (proc_lib.erl, line 240) ** Arg 1 = #{stream_encrypted => true,tls_required => true, lang => <<"en-US">>,mgmt_queue_type => ram,mod => ejabberd_c2s, stream_version => {1,0}, stream_authenticated => false, stream_id => <<"17337282484036752342">>, xmlns => <<"jabber:client">>, csi_queue => {0,#{}}, zlib => false,tls_enabled => false,mgmt_stanzas_req => 0, mgmt_state => inactive,mgmt_resend => false, sasl_state => {sasl_state,<<"xxxx.yyyy.zzz">>,<<"SCRAM-SHA-1">>, {state,4,sha,false,<<>>, <<102,239,95,115,78,151,116,204,98,14,74,98,97,8,83, 140,58,108,84,245>>, <<98,68,104,191,113,77,110,178,129,156,80,67,216,223, 49,58,213,97,101,209>>, <<"xxxxxxxxxxx">>,ejabberd_auth_sql, #Fun, <<"n=xxxxxxxxxxx,r=Lrx1zyj7!\"GG({skS~R]WF<&4Jq)x!j*,r=Lrx1zyj7!\"GG({skS~R]WF<&4Jq)x!j*8GQJr3IAr5F6Ry/x2+ZdIQ==,s=77P3hWx8CPVpjLxyVOkRrw==,i=4096">>, <<"Lrx1zyj7!\"GG({skS~R]WF<&4Jq)x!j*">>, <<"8GQJr3IAr5F6Ry/x2+ZdIQ==">>}, #Fun, #Fun, #Fun}, mgmt_max_timeout => 15000,auth_module => ejabberd_auth_sql, user => <<>>,sasl_mech => <<"SCRAM-SHA-1">>, owner => <0.19407.4730>,mgmt_stanzas_in => 0,tls_verify => false, shaper => c2s_shaper, ip => {{0,0,0,0,0,65535,46268,60559},57798}, stream_header_sent => true,csi_state => active,resource => <<>>, access => c2s, socket => {socket_state,fast_tls, {tlssock,#Port<0.154009556>, #Ref<0.3982367705.1902247940.190492>}, 65536,#Ref<0.3982367705.1902247940.190011>, {state,32000,32000,32000,1713967203482216}, {{{0,0,0,0,0,65535,2807,35582},443}, {{0,0,0,0,0,65535,46268,60559},57798}}}, lserver => <<"xxxx.yyyy.zzzzz">>, socket_monitor => #Ref<0.3982367705.1902116868.190012>, codec_options => [ignore_els], mgmt_stanzas_out => 0, tls_options => [compression_none], mgmt_timeout => 15000,stream_direction => in, stream_restarted => true,mgmt_max_queue => 5000, mgmt_ack_timeout => 60000, pres_a => {0,nil}, stream_state => wait_for_sasl_response, server => <<"xxxx.yyyy.zzzzz">>, stream_timeout => {40000,-572903896940}, stream_compressed => false} ** Arg 2 = true ** Arg 3 = <<"xxxxxxxxxxx">> 2024-04-24 19:30:03.484898+05:30 [error] <0.3178.4726>@ejabberd_hooks:safe_apply/4:324 Hook c2s_stream_started crashed when running mod_fail2ban:c2s_stream_started/2: ** exception error: bad argument in function ets:lookup/2 called as ets:lookup(failed_auth,{0,0,0,0,0,65535,38971,5906}) *** argument 1: the table identifier does not refer to an existing ETS table in call from mod_fail2ban:c2s_stream_started/2 (src/mod_fail2ban.erl, line 91) in call from ejabberd_hooks:safe_apply/4 (src/ejabberd_hooks.erl, line 320) in call from ejabberd_hooks:run_fold1/4 (src/ejabberd_hooks.erl, line 301) in call from xmpp_stream_in:process_stream/2 (src/xmpp_stream_in.erl, line 636) in call from xmpp_stream_in:handle_info/2 (src/xmpp_stream_in.erl, line 355) in call from p1_server:handle_msg/8 (src/p1_server.erl, line 696) in call from proc_lib:init_p_do_apply/3 (proc_lib.erl, line 240) ** Arg 1 = #{stream_encrypted => false,tls_required => true, lang => <<"en-IN">>,mgmt_queue_type => ram,mod => ejabberd_c2s, stream_version => {1,0}, stream_authenticated => false, stream_id => <<"141095349895579897">>, xmlns => <<"jabber:client">>, csi_queue => {0,#{}}, zlib => false,tls_enabled => false,mgmt_stanzas_req => 0, mgmt_state => inactive,mgmt_resend => false, mgmt_max_timeout => 15000,user => <<>>,owner => <0.3178.4726>, mgmt_stanzas_in => 0,tls_verify => false,shaper => c2s_shaper, ip => {{0,0,0,0,0,65535,38971,5906},40462}, stream_header_sent => true,csi_state => active,resource => <<>>, access => c2s, socket => {socket_state,gen_tcp,#Port<0.153934479>,65536, #Ref<0.3982367705.2325610497.117400>, {state,32000,32000,32000,1713967203484662}, {{{0,0,0,0,0,65535,2807,35582},443}, {{0,0,0,0,0,65535,38971,5906},40462}}}, lserver => <<"xxxx.yyyy.zzzzz">>, socket_monitor => #Ref<0.3982367705.2325479425.117401>, codec_options => [ignore_els], mgmt_stanzas_out => 0, tls_options => [compression_none], mgmt_timeout => 15000,stream_direction => in, stream_restarted => false,mgmt_max_queue => 5000, mgmt_ack_timeout => 60000, pres_a => {0,nil}, stream_state => wait_for_stream, server => <<"xxxx.yyyy.zzzzz">>, stream_timeout => {40000,-572903895884}, stream_compressed => false} ** Arg 2 = {stream_start,{jid,<<"xxxxxxxxxxx">>, <<"xxxx.yyyy.zzzzz">>,<<>>, <<"xxxxxxxxxxx">>, <<"xxxx.yyyy.zzzzz">>,<<>>}, {jid,<<>>,<<"xxxx.yyyy.zzzzz">>,<<>>,<<>>, <<"xxxx.yyyy.zzzzz">>,<<>>}, <<>>, {1,0}, <<"jabber:client">>, <<"http://etherx.jabber.org/streams">>,<<>>, <<"en-IN">>} ```

Bug description

I am running an eJbberd cluster with 3 nodes. For testing purpose, the mod_fail2ban module was disabled in ejabberd.yml and configuration was reloaded in all the three nodes.

After a day, the module was enabled back in ejabberd.yml for all the three nodes and reloaded. Since that point, I am observing the above errors continuously. The error disappears in the node when mod_fail2ban is disabled.

Please help.

badlop commented 2 months ago

The failed_auth ets table is created when mod_fail2ban is started, and it is deleted when the module is stopped and is not running in any other vhost in ejabberd.

When a client authenticates correctly, any row related to its IP address is removed in that table.

Looking at your error message, it appears when a client authenticates correctly, and the module tries to remove the row from the table. However, it seems the table does not exist at all.

I wonder: does that ets table exist or not? You can check it by entering ejabberdctl live shell attached to your ejabberd node:

$ ejabberdctl debug

(ejabberd@localhost)1> ets:info(failed_auth).
[{id,#Ref<0.2980915192.329908227.29410>},
 {decentralized_counters,false},
 {read_concurrency,false},
 {write_concurrency,false},
 {compressed,false},
 {memory,305},
 {owner,<0.1107.0>},
 {heir,<0.881.0>},
 {name,failed_auth},
 {size,0},
 {node,ejabberd@localhost},
 {named_table,true},
 {type,set},
 {keypos,1},
 {protection,public}]

ets tables are local, not clustered like mnesia tables. That table should exist in each of your ejabberd nodes, and it may contain different data in each node.

You can view the table content. In my case, my server got only one authetnication failure:

(ejabberd@localhost)2> ets:tab2list(failed_auth).
[{{0,0,0,0,0,65535,32512,1},1,1714044872774,20}]

When you disable the module and restart ejabberd, is the table removed?

When you enable the module again and restart ejabberd, is the table created?

logicwonder commented 1 month ago

Apologising for my late reply.

When you disable the module and restart ejabberd, is the table removed?

Yes

When you enable the module again and restart ejabberd, is the table created?

Out of three nodes, the table got created back only in the third node. The module is reverted to dsabled in node1 and node 2. Tried enabling and disabling multiple times in two nodes but the table is not getting created.

See the command output below:


(ejabberd@node1)1> ets:tab2list(failed_auth).
** exception error: bad argument
     in function  ets:match_object/2
        called as ets:match_object(failed_auth,'_')
        *** argument 1: the table identifier does not refer to an existing ETS table
     in call from ets:tab2list/1 (ets.erl, line 771)

(ejabberd@node2)1> ets:tab2list(failed_auth).
** exception error: bad argument
     in function  ets:match_object/2
        called as ets:match_object(failed_auth,'_')
        *** argument 1: the table identifier does not refer to an existing ETS table
     in call from ets:tab2list/1 (ets.erl, line 771)

(ejabberd@node3)1> ets:tab2list(failed_auth).
[{{0,0,0,0,0,65535,10857,62538},1,1716022340131,20}]
logicwonder commented 1 month ago

I removed the problematic nodes from cluster, and isolated the nodes. Then Mnesia was cleared and restarted after enabling mod_fail2ban. It was observed that the ets:tab2list table got recreated. The nodes were then joined back to the cluster. The problem got resolved.

Thanks @badlop for providing the scripts for checking the issue.