rabbitmq / rabbitmq-server

Open source RabbitMQ: core server and tier 1 (built-in) plugins
https://www.rabbitmq.com/
Other
12.35k stars 3.92k forks source link

Stateless segment auth cache crashes on `gc` #3267

Open luos opened 3 years ago

luos commented 3 years ago

Hi,

We were trying to use rabbit_auth_cache_ets_segmented_stateless but it crashes when a gc is called. It also leaks table refs in the rabbit_auth_cache_ets_segmented_stateless_segment_table table if we fix the crash. Tested with 3.9.19 branch but the code did not change.

2021-08-05 15:58:45.866 [error] <0.4791.0> ** Generic server rabbit_auth_cache_ets_segmented_stateless terminating
** Last message in was gc
** When Server state == {state,{interval,#Ref<0.4277322854.3740794881.249325>}}
** Reason for termination ==
** {badarg,[{ets,delete,[#Ref<0.4277322854.3740925953.249323>],[]},{rabbit_auth_cache_ets_segmented_stateless,'-handle_info/2-lc$^0/1-0-',1,[{file,"src/rabbit_auth_cache_ets_segmented_stateless.erl"},{line,79}]},{rabbit_auth_cache_ets_segmented_stateless,handle_info,2,[{file,"src/rabbit_auth_cache_ets_segmented_stateless.erl"},{line,80}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,680}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,756}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,226}]}]}
2021-08-05 15:58:45.867 [error] <0.4791.0> CRASH REPORT Process rabbit_auth_cache_ets_segmented_stateless with 0 neighbours crashed with reason: bad argument in call to ets:delete(#Ref<0.4277322854.3740925953.249323>) in rabbit_auth_cache_ets_segmented_stateless:'-handle_info/2-lc$^0/1-0-'/1 line 79
2021-08-05 15:58:45.867 [error] <0.924.0> Supervisor rabbit_auth_backend_cache_app had child auth_cache started with rabbit_auth_cache_ets_segmented_stateless:start_link(10000) at <0.4791.0> exit with reason bad argument in call to ets:delete(#Ref<0.4277322854.3740925953.249323>) in rabbit_auth_cache_ets_segmented_stateless:'-handle_info/2-lc$^0/1-0-'/1 line 79 in context child_terminated

It crashes on this line: https://github.com/rabbitmq/rabbitmq-server/blob/ad9b4aafb552aabd1f09b053b8e97c18936a9fdd/deps/rabbitmq_auth_backend_cache/src/rabbit_auth_cache_ets_segmented_stateless.erl#L77

The reason is that the segment table ref is never removed from ?SEGMENT_TABLE table, therefore first it removes the table, and on the next gc call it crashes.

The used config was the following:

auth_backends.1 = cache
auth_cache.cached_backend = internal

auth_cache.cache_module = rabbit_auth_cache_ets_segmented_stateless
[{lager, [
    {error_logger_hwm, 200}
]},
 {rabbitmq_auth_backend_cache, [{cache_module_args, [10000]}]}
].
michaelklishin commented 3 years ago

This was developed mostly as a PoC that different K/V approaches would be feasible. You may be the only known user going with this module into production ;)

luos commented 3 years ago

Thankfully it did not make it that far. Actually, we were looking for a solution to https://github.com/rabbitmq/rabbitmq-server/pull/2792 but now that the timeout seems to be increased hopefully it will behave better - though I like the auth backends a bit better as timeouts can still happen. :)

Do you know if anyone uses the ets_segmented one?

Marking the other modules as well with This module is for demonstration only and should not be used in production. as the dict one would makes sense in this case. What do you think?

michaelklishin commented 3 years ago

@luos most people never override the default. We can mark all alternative ones as such.