Use concurrent futures to reduce blocking IO

ryanhiebert / hirefire

A Python lib to integrate with the HireFire service -- The Heroku Proccess Manager.

https://hirefire.readthedocs.io/

Other

34 stars 21 forks source link

Use concurrent futures to reduce blocking IO #37

Closed codingjoe closed 6 years ago

codingjoe commented 6 years ago

Especially the queue inspection can be currently very slow, because a transport to the message broker needs to be opened and messages are exchanged. Especially in multi queue setup these operations should happen in a concurrent manner.

codingjoe commented 6 years ago

Hi @ryanhiebert I hope you like it. It should give this beauty a couple extra horse powers. It currently takes use 3200ms to process this view. I will give you an update on how the performance improves with concurrent IO.

codingjoe commented 6 years ago

Sorry for all the commits, I was so lazy I coded it in a browser :P Anyhow, I tested it. Works, and is faster. Best! Joe

codingjoe commented 6 years ago

/ping @ryanhiebert

ryanhiebert commented 6 years ago

Perhaps this needs to be a setting, so that if we find that there are bugs with race conditions, etc, it will only affect those who have opted into using this feature. It does seems like a pretty cool thing to do.

ryanhiebert commented 6 years ago

I'd really like this to be behind a setting. @codingjoe : are you willing to add such a setting, so that we can be sure that things won't break for existing users, even if there's some weirdness in one of the backend types that isn't compatible with threads?

codingjoe commented 6 years ago

That's a great idea. I'll do that. I will be disabled by default to ensure backwards compatibility.

ryanhiebert commented 6 years ago

I have Slack periodically reminding me about this, so it doesn't fall off my radar, so I'm passing that reminder along to you, @codingjoe . Do you have an idea of when you might be able to look at it, so I don't keep bugging you when you know you won't be able to get to it yet?

codingjoe commented 6 years ago

@ryanhiebert I am on DjangoCon now, have plenty time to pick up my pending contributions. I'll work on it right now.

codingjoe commented 6 years ago

@ryanhiebert do you want this to be a Django setting? It's tricky, since it's implemented in the core library. I might need to inherit some parts in the contrib section to ensure it's working in Django properly.

codingjoe commented 6 years ago

@ryanhiebert ok, this should do it. BTW, I say the the custom json encoder is not really needed anywhere since you don't encode datetimes. You might want to consider dropping it.

codingjoe commented 6 years ago

Oh I also took the liberty to use a JsonResponse. Its there since 1.8, I could find an official list of supported versions. But the oldest supported Django version (support by Django itself) is 1.11. So this should be fine.

codingjoe commented 6 years ago

ping @ryanhiebert

ryanhiebert commented 6 years ago

I'm sorry that I haven't responded on this. The design looks great. It turns out this is going to break my own usage of the library, because in my $WORK code I use that native_dump_procs function, and that's why I didn't get to it like I should have. Thank you for the ping, I needed it.

It looks really good, despite the hangup that has been slowing me down, and I think it's the right way to go.

codingjoe commented 6 years ago

@ryanhiebert little anecdote: This change + disabling worker pool inspection increased performance on our site by 206x from >3s to <15ms ;)

ryanhiebert commented 6 years ago

Sweet! The worker inspection is what really kills the time, so that doesn't surprise me in the least. I'm not sure, though, that you necessarily gained much just from this change. Problem is that I don't get a reliable number, that won't shut down running tasks, unless I use the worker inspection, so its kinda a non-starter to do that for my use-cases.

codingjoe commented 6 years ago

Yes, that is right. We run many queues tho. So concurrently did cut it by 10x Skipping inspection by another 20x. We don't really need the numbers to be perfect. Since we only scale on queue size for slow tasks and high volume. We scale high speed queues based on message rate. We wrote your own Procs for that. I'll port them upstream once it get the time. Message rate can be really helpful if you want ensure a certain msg ack time.