mdevilliers / SignalR.RabbitMq

MessageBus implementation using RabbitMq as the backing store.
MIT License
89 stars 40 forks source link

Backplane fails to recover after disconnect #44

Closed Entroper closed 8 years ago

Entroper commented 8 years ago

We've been testing our server's ability to recover when nodes in a cluster go down. Recently we fixed an issue in RabbitMq.Client where their auto-recovering model was crashing on stale delivery tags. Before this fix, we observed that our pub/sub channels were not reconnecting in pairs, we would only get one channel back up after recovery.

After the fix, most of our Rabbit connections were recovering fine, but we noticed SignalR was only opening 1 channel instead of the usual 2. I noticed SignalR.RabbitMq was connecting with EasyNetQ, which disables the auto-recovering model.

Long story short, we decided to try and replace EasyNetQ with just the base RabbitMq.Client, and it worked great, we're able to recover normally when a node goes down.

https://github.com/Entroper/SignalR.RabbitMq/commit/373b0827d85cc65efbccfbc44dfe5922696ca59d?w=1

This is kind of a disruptive change, so rather than opening a PR, I thought I would just show our solution and open a conversation. It's my understanding that the auto-recovering model is fairly new to RabbitMq.Client, and EasyNetQ has its own recovery strategy.

mdevilliers commented 8 years ago

Hey,

Thanks for getting in touch.

From memory using the EasyNetQ library was 90% due to its re-connection ability. As that is built into the RabbitMQ.Client I would be pretty keen to drop the dependency. It also sounds like their reconnect functions seem to be competing with each other.

I would love a PR with your changes - especially as it means overall less code to look after!

I do have some queries (bear in mind I'm being lazy here)

Thanks again for your time,

Mark

Entroper commented 8 years ago

Splendid!

I have some more testing and some cleanup to do, but I'll get the PR in soon. I'll check on the other nuget references.

Come to think of it, I'm actually hard-referencing our internal build of RabbitMQ.Client for the moment, since that includes the fix for the stale tag issue. Might not want to merge this until that fix is pushed to nuget.

Entroper commented 8 years ago

Apparently the fix was already pushed in RabbitMQ.Client 3.6.3 last week, so I can just update to that version.

Entroper commented 8 years ago

So, the base project was already not dependent on Newtonsoft. The console and example projects required Newtonsoft because of SignalR.

mdevilliers commented 8 years ago

I've merged your PR - thanks again!

mdevilliers commented 8 years ago

@Entroper if you get a moment could you try out the latest nuget file? It is on AppVeyor

https://ci.appveyor.com/project/mdevilliers/signalr-rabbitmq/build/artifacts

Thanks,

Mark

Entroper commented 8 years ago

@mdevilliers I'll try to remember to test the package on Monday. Thanks!

Entroper commented 8 years ago

@mdevilliers We tested yesterday and today, everything works great.

mdevilliers commented 8 years ago

Thanks for that - I'll make the new nuget release