mdevilliers / SignalR.RabbitMq

MessageBus implementation using RabbitMq as the backing store.
MIT License
89 stars 40 forks source link

Multiple messages to the same client sent during the same millisecond are lost #9

Closed kevingorski closed 11 years ago

kevingorski commented 11 years ago

With local clients and close RabbitMQ server, I've seen every message after the first generated in the same millisecond on the server fail to be delivered. I want to blame the UniqueMessageIdentifierGenerator, but I really don't know much about message de-duping in SignalR. I'll see if I can put together a sample to repro.

kevingorski commented 11 years ago

Maybe it's as simple as changing

if (toReturn < _lastSeenMessageIdentifier)

to

if (toReturn <= _lastSeenMessageIdentifier)

Since GenerateValue would return the same toReturn value in the same ms?

mdevilliers commented 11 years ago

Maybe..does your push request fix the issue when you are testing?

The _lastSeenMessageIdentifier is initially seeded of a tick value which represents one hundred nanoseconds or one ten-millionth of a second. There are 10,000 ticks in a millisecond and a thousand milliseconds a second.

One culprit could be

_bus.Subscribe<RabbitMqMessageWrapper>(_queue,
                (msg, messageReceivedInfo) =>
                    Task.Factory.StartNew(() =>  OnMessage(msg.Body)));

So it could be a race condition that when receiving a message forwards them out of order? Theoretically the messages should be queued in order on Rabbit so we just need to take them off and forward them as fast as we can. I was paranoid about not wanting rabbit to queue as the performance drops off spectacularly if it does....

The signalr code for 0.5 used to have some unit tests that tests this but I haven't checked out the 1.1.0 code. I'll need to investigate...

If you have a reproducible test could you try the code without starting the new thread?

kevingorski commented 11 years ago

I was able to confirm that messages sent within the same millisecond (as seen in a log message) were still delivered, but it's possible that some of the messages were actually sent in the same "tick" due to the difference between the accuracy and precision of DateTime.Ticks.

This was all within my app, so I don't have standalone recreation code to share, unfortunately.

-edit-

I'm still having messages get dropped, which might point to a separate issue, possibly on my side.

kevingorski commented 11 years ago

After adding much more logging, it looks like the messages are not being sent in message.Id order (as determined from logging in RabbitMqMessageBus.Send) nor received in the order sent (logging in ForwardOnReceivedMessage). This is all on one machine.

2013-05-21 18:04:42.1419 INFO Sending 1 messages id 635047778821499846 
2013-05-21 18:04:42.1419 INFO Sending 1 messages id 635047778821499847 
2013-05-21 18:04:42.1419 INFO Sending 1 messages id 635047778821509847 
2013-05-21 18:04:42.1419 INFO Sending 1 messages id 635047778821499846 <= dupe of first id
2013-05-21 18:04:42.1419 INFO Sending 1 messages id 635047778821519848 
2013-05-21 18:04:42.1419 INFO Sending 1 messages id 635047778821529849 
2013-05-21 18:04:42.3140 INFO Received message id 635047778821499847 
2013-05-21 18:04:42.3140 INFO Received message id 635047778821499846 
2013-05-21 18:04:42.3140 INFO Received message id 635047778821519848 
2013-05-21 18:04:42.3140 INFO Received message id 635047778821509847 
2013-05-21 18:04:42.3140 INFO Received message id 635047778821499846 
2013-05-21 18:04:42.3620 INFO Received message id 635047778821529849 
mdevilliers commented 11 years ago

Cool. Good find.

I've added a branch "issue9" and added in some logging round the unique id generation plus a hopeful fix - i don't think I was properly paying attention when I wrote that.

Would you have a chance to test it out in your application and see if it fixes the issue? It seems to initially in my test harness but a real world app is a lot better.

Thanks for the link...very interesting.

kevingorski commented 11 years ago

I'll try it out this morning.

kevingorski commented 11 years ago

I pulled down latest on issue9 and appear to be hitting a deadlock whenever I try to refresh the page where a SignalR connection has already been made. I don't understand why, but maybe you'll have an idea.

mdevilliers commented 11 years ago

Back to the drawing board...

mdevilliers commented 11 years ago

I've reinstated some of the new thread on the sending of messages which I believe was the cause of the deadlock. If you have a spare time today could you retest as this will at least test the changes to the unique message identifier? However I think this part might need to be rethought/designed as it nags at me as being very hacky.

kevingorski commented 11 years ago

I retested and the deadlock behavior is gone, but messages still appear to be getting dropped because of id ordering.

For now I'm going to try to get the Redis backplane up and see if I have more luck.

mdevilliers commented 11 years ago

Hey Kevin,

I think I've managed to resolve this now.

Essentially to resolve the issue I've needed to develop a custom exchange type in RabbitMQ that is used to guarantee order of the messages.

The Redis message bus does something similar e.g. gains a unique number from Redis and sends the message as a transaction.

The deployment is a bit more involved e.g. adding a custom plugin to Rabbit but it seems to be the only way to do this.

I understand it might be too late for you and I do apologize but if it wasn't could you find some time to test the new version?

Thanks,

Mark

kevingorski commented 11 years ago

Mark - Yeah, it's too late for my implementation to switch back, and I don't think getting hosting to install a custom queue type would be trivial. I hope it works out, though.