tonyg / rabbithub

Experimental RabbitMQ PubSubHubBub interface
228 stars 29 forks source link

Race around mnesia:create_schema, mnesia:create_table #1

Open squaremo opened 14 years ago

squaremo commented 14 years ago

On my mac, RabbitMQ with rabbithub as a plugin often (nearly always, in fact) fails to start. It aborts in rabbithub_app:setup_schema while calling mnesia:create_table, and gives {badtype, rabbithub_lease, disc_copies, rabbit@localhost} as the reason.

It's easy to elicit this from mnesia: $ erl

1> mnesia:start(). ok 2> mnesia:create_table(foo, [{attributes, [bar, baz]}, {disc_copies, [node()]}]). {aborted,{bad_type,foo,disc_copies,nonode@nohost}}

This doesn't happen if you've called mnesia:create_schema/1 before starting mnesia. But wait -- that's exactly what setup_schema does!

I suspect there's a race between RabbitMQ calling mnesia:create_schema/1 and mnesia:start/0, and rabbithub doing so then trying to create the table.

squaremo commented 14 years ago

It's not just my mac. I can make this happen with the following steps, in rabbitmq-server:

To make it work again,

squaremo commented 14 years ago

This looks like a bug (er, feature?) in mnesia:

$ erl Erlang R13B02 (erts-5.7.3) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.3 (abort with ^G) 1> mnesia:start(). ok 2> mnesia:create_schema([node()]). {error,{nonode@nohost,{already_exists,nonode@nohost}}} 3> mnesia:create_table(foo, [{attributes, [bar, baz]}, {disc_copies, [node()]}]). {aborted,{bad_type,foo,disc_copies,nonode@nohost}}

The problem is that create_schema reports 'already_exists', even though it didn't in fact create the schema! The rabbithub code matches on that and proceeds, presuming it means what it says.

The solution is to call mnesia:stop() before create_schema; if mnesia is stopped, it will create the schema properly.

tonyg commented 14 years ago

So looking at git blame rabbithub.erl, it looks like the boot_step was added at around the same time you were posting this report. I suspect the boot step fixes the issue: is that true? Can you still reproduce it with the latest revision?