mgroves / EmpiriCall

Collect information and analytics about code that is actually being called
MIT License
0 stars 0 forks source link

RabbitMQ Code Review #1

Closed mgroves closed 9 years ago

mgroves commented 9 years ago

I am new to using RabbitMQ. I wrote a producer and a consumer as part of this project, and I want:

a) To make sure I'm doing it correctly b) To make sure I have a reasonable amount of configurability, for common use cases of configuring RabbitMQ writing/reading. c) Alternatives that might be better, and why.

Here is the code that is doing the producing: https://github.com/mgroves/EmpiriCall/blob/master/EmpiriCall.Data.RabbitMQ/CommandHandlers/CommandHandlerAddRecord.cs

Here is the code that is doing the consuming: https://github.com/mgroves/EmpiriCall/blob/master/EmpiriCall.Data.RabbitMQ.Consumer/Program.cs (it is reading from the queue and writing records to SQL via Entity Framework).

joshrothe commented 9 years ago

Use easy net q http://easynetq.com/ (for rabbit)

Works like a champ w high volume (millions of calls) also handles a lot of the heavy lifting stuff for you. Subscriptions are lambdas and its stupid simple.

mgroves commented 9 years ago

That EasyNetQ looks nice, but I'm not sure a publish/subscribe is what I want. I want these messages to be queued and consumed at most one time.

joshrothe commented 9 years ago

Set the fetchcount to 1. That enables FIFO. The publish is a bonus. Having done this now at web scale for 3 companies I can vouch it makes building rabbit consumers way easier and handles exceptions gracefully to the point it seemed not worth the effort to reinvent the wheel. On Apr 22, 2015 2:29 PM, "Matthew D. Groves" notifications@github.com wrote:That EasyNetQ looks nice, but I'm not sure a publish/subscribe is what I want. I want these messages to be queued and consumed at most one time.

—Reply to this email directly or view it on GitHub.

joshrothe commented 9 years ago

Your consumer just becomes a subscriber. Set fetch to 1 and process items off the queue 1 at a time as they come in. Its how we did realtime analytics where order received mattered.

mgroves commented 9 years ago

If a second consumer somehow is running, won't it also get the same message and process it as well?

joshrothe commented 9 years ago

How would that happen? Technically yes unless unconfigure the queue as lockout/exclusive which easynet has a builder for.

joshrothe commented 9 years ago

Easynet buys you a really nice fluent configuartion for rabbit as well as the pubsub stuff which is in the rabbit driver anyway as well as error processing and some advanced bussing stuff (polling,etc). Its all stuff u can do w the driver yourself but for us it went back to why reinvent what's been done.

joshrothe commented 9 years ago

Also enable the rabbit admin ui if you haven't yet. Its a lifesaver

mgroves commented 9 years ago

My main concern with pub/sub is that I don't want a message processed twice, even if there are multiple consumers running somehow. That's why I went with BasicConsume/Dequeue. Am I mistaken?

P.S. I don't realtime. As long as the messages get consumed eventually, I'm fine with that.

joshrothe commented 9 years ago

well its not really a pubsub. its rabbit. and in re-reading your question, no messages will not be processed twice regardless of how they get in there. thats all managed by the rabbit server. rabbit uses a combination of fetch count by registered client and farms that number of items out to the client that has available slots. once a message is ackd it is no longer able to be processed by another consumer unless it is requeued or returned to an unackd state.

joshrothe commented 9 years ago

your concern is legit with a non guaranteed fifo queue, like AWS SQS. most AMQP queues like rabbit and even IBM prevent duplicate message processing regardless of consumer count. dont get caught up in the terms publish/subscribe in easy net q. at the end of the day, all publish does is dumps a message on a rabbit queue. all subscribe does is register a consumer to that queue and use a configurable polling interval to automatically pop messages off. the finer details of the actual queuing take place in the rabbit server code. its not a pubsub like say xmpp....

joshrothe commented 9 years ago

and again, you can certainly do all this yourself straight up with the driver.

joshrothe commented 9 years ago

also, i really didnt answer your initial request. from a code perspective, i think whats there looks fine. you just want to make sure you handle the exceptions and dont break the channel because it can get moody when that happens; also make sure you put broken messages somewhere otherwise they just float off. i dont recall if it will auto unack them if it breaks (for some reason i think it may but only in certain circumstances), but if it does you can get into a nasty place where you just keep breaking and the queue backs up or messages become ackd but something broke so now they are suspended in an ack state which makes them unable to be consumed- in that case, the admin ui is essential to try to figure out what is going on. as a side note, we wound up writing payloads out to a capped collection in mongo just so we could see what was going on...its a thought. also, you probably want to make the queue durable to safeguard against failure (perf hit is minimal)- but be sure to mark each message durable or they wont be flushed to disk. best of luck ;)

correl commented 9 years ago

The code looks straightforward enough to me. Looks like you're simply configuring a single queue on rabbit, and publishing to it via the default exchange. You won't have to worry about message duplication on the rabbit side, at least. Publish/Subscribe in terms of the library mentioned above is entirely distinct from publish/subscribe in terms of message delivery via Rabbit (Rabbit has great info on how that all works in its tutorials.

I do recommend that your consumer perform its database inserts idempotently, though. Rabbit can guarantee a message will be delivered, it can guarantee that it will only be delivered to a particular queue once, but can't guarantee it'll only be delivered to a consumer once. Your code can explicitly acknowledge a message, requeue it, or reject it to a dead letter exchange. Unacknowledged messages will be requeued automatically after a timeout.

On the topic of delivery guarantees, I don't see you taking advantage of that. If the messages are critical, that may also be worth considering. Rabbit is capable of letting you know that a message was delivered to a queue (you can require that one be bound to the exchange to receive it) and that it (most importantly) was written to disk. Durable queues help too, as they will persist rather than disappearing when their clients disconnect.

correl commented 9 years ago

Lastly, I noticed you're declaring your queue in both places. That's not necessarily bad, but you'll want to be absolutely certain they're declaring them the same way, otherwise whichever one wins that race determines its configuration. It might be easier to configure a durable queue ahead of time and have the consumer just use that. The producer doesn't need to have any knowledge of it, it only needs to know what exchange and routing key to use (the default exchange simply uses your supplied routing key to find your named queue, as all queues are implicity bound to the default exchange by name that way).

mgroves commented 9 years ago

I guess I should've stated this to begin with: I do not care if individual messages occasionally are lost. The idea is to get a "statistically significant" amount of data in aggregate. Performance is more critical than handling every single message. Not processing the same message more than once is the second most important thing. If a message occasionally "floats away", that is no problem. This isn't a logging framework.

correl commented 9 years ago

Ok. I wasn't sure, so I figured I'd cover all the bases. The burden of not processing a message more than once is on the consumer, since there is the possibility (e.g., in the case of a network interruption), that rabbit will deliver a message to a consumer, but not be sure it got there, in which case it may resend it later. It's probable this won't happen often, but it can, so you might want to be prepared. There might be some option to tell a Rabbit queue not to care whether a message is received, but everything I've done has required guaranteed delivery, so I wouldn't be familiar with it.

correl commented 9 years ago

Though, as an aside, a logging framework is a perfect example of a system that would want to discard messages when it's overloaded ;)

mgroves commented 9 years ago

Trying out EasyNetQ today, and I ran across this in the docs: "If you start publishing and no subscribers have ever been started then your messages simply disappear. This is by design." An interesting design, and it explains why nothing was getting queued up after I started the consumer.

mgroves commented 9 years ago

Another note from the docs that I think sums up what you guys were trying to get through my thick skull: "If you call Subscribe two times with the same message type and subscription id, you will create two consumers consuming from the same queue. RabbitMQ will then round-robin successive messages to each consumer in turn. This is great for scaling and work-sharing." This is exactly what I want.

mgroves commented 9 years ago

I think EasyNetQ is just what I wanted. If one or both of you (or anyone else, really) wouldn't mind, could you please review the implementation in this branch: https://github.com/mgroves/EmpiriCall/tree/issue1codereview and let me know what you think?

mgroves commented 9 years ago

Okay, I'm calling this one good for now. I'm always open to more ideas and review though, just open up a new issue!