processone / ejabberd

Robust, Ubiquitous and Massively Scalable Messaging Platform (XMPP, MQTT, SIP Server)
https://www.process-one.net/en/ejabberd/
Other
6.09k stars 1.51k forks source link

Some Rooms crash but still PID in mnesia #3351

Open suf4dev opened 4 years ago

suf4dev commented 4 years ago

Environment

Errors from error.log/crash.log

2020-08-03 02:23:24 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.1148.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 02:27:25 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.656.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 02:28:55 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.658.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 02:32:26 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.659.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 02:40:58 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.959.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 02:42:29 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.709.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 02:54:02 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.667.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 03:27:40 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.1238.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 04:05:48 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.1428.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 04:14:20 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.1459.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 06:04:49 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.1077.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 06:04:49 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.1306.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 06:35:28 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.679.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 06:50:32 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.663.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 07:05:36 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.666.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

2020-08-03 07:13:09 =SUPERVISOR REPORT====
     Supervisor: {local,'mod_muc_room_sup_chat-server1.net'}
     Context:    child_terminated
     Reason:     killed
     Offender:   [{pid,<0.705.0>},{id,undefined},{mfargs,{mod_muc_room,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

Bug description

Hello, When trying to get Config of the room it returns {error,not_found} but the room PID stills in Mnesia . The problem occurs at some point when some flooders make the room crash. I have been struggling from a week but I did not find any solution. I have tried OTP 21.2.7 and OTP 23.0.3 . Now most of Rooms dissapeard from list but stills in Mysql database. I have to restart the whole server for rooms to show again but after sometime crashes some rooms again

badlop commented 4 years ago

The problem is easy to reproduce:

  1. Create a new room, for example joining room1
  2. In Erlang shell, execute something like: exit(element(2, mod_muc:find_online_room(<<"room1">>, <<"conference.localhost">>)), kill).
  3. It shows an error message like what you mentioned...
  4. but nothing else! The mod_muc service is not aware of the room crash, the muc_online_room Mnesia table still mentions the room, and the occupants were not informed about the room crash.
suf4dev commented 4 years ago

The problem is easy to reproduce:

  1. Create a new room, for example joining room1
  2. In Erlang shell, execute something like: exit(element(2, mod_muc:find_online_room(<<"room1">>, <<"conference.localhost">>)), kill).
  3. It shows an error message like what you mentioned...
  4. but nothing else! The mod_muc service is not aware of the room crash, the muc_online_room Mnesia table still mentions the room, and the occupants were not informed about the room crash.

The shell return just "true" and nothing else. No crash log

suf4dev commented 4 years ago

I have determined what causes room crash. When room captcha enabled, Some users use tools to request too many captcha . after a few minutes the room crashes permanently until restart the server.

badlop commented 4 years ago

Did you find in the logs any specific error lines related to those captcha crashes? Or the only thing found in the logs are the lines you already copied in the ticket description?

suf4dev commented 4 years ago

Did you find in the logs any specific error lines related to those captcha crashes? Or the only thing found in the logs are the lines you already copied in the ticket description?

No. Just only the lines copied in ticket description

mremond commented 3 years ago

@badlop I think we should rate limit the captcha generation from user to mitigate this issue.