stackvana / hook.io

Open-Source Microservice Hosting Platform
https://hook.io
Other
1.26k stars 117 forks source link

Considering implementing concurrency limit to hook execution #191

Closed reimertz closed 6 years ago

reimertz commented 8 years ago

First, sorry for bringing down the entire service.

So, I wanted to check if web-sockets worked properly. So I booted up a simple ws-example with a 60 second timeout. What I accidentally did was to also set it as a cron job that spawned a micro-service each second. #stupid

This resulted in some major issues.

I guess a solution might be to have magic formula that calculates if the combination of cron job and timeouts is valid or not.

timeout = Math.min(timeout, MAX_TIMEOUT_WHEN_CRON_EVERY_SECOND)
Marak commented 8 years ago

Still not 100% sure what happened on production yet. Having long custom timeouts set for services which never end seems like it could have been the problem.

I think the most correct solution might be to enable a concurrency limit on the amount of times a service can be executed at the same time. In other words, the same service could only have N instances of it running at once. In the majority of cases, this should not come up. If a service has a high custom time-out value and is repeatedly run and hangs, the concurrency limit should kick in.

Will continue to investigate.

reimertz commented 8 years ago

Legit solution.

By the way, is each instance only used once and then shut down? Or could it be used again within it's given timeout if a there is a new request on the given hook url?

It might be easier on the server if instances are reused.

Marak commented 8 years ago

You'd really have to read into the documentation / blog posts / and source code to understand specifically how the hook service life-cycle works...especially if you'd like to comment on what might be easy versus hard...

When I say instance in this context, I am referring to a single systems process running a single copy of a users' hook service. Every request is served by a one-time-use single operating system process.

To answer your question, the instance is shut down every time. This is by design and works well. Re-using instances would create a stateful process, which can cause major headaches.

Check out https://hook.io/blog/the-monolith-versus-the-microservice-a-tale-of-two-applications

Marak commented 8 years ago

Spawn lifecycle can be found here: https://github.com/bigcompany/hook.io/blob/master/lib/resources/hook/spawnService.js#L15

Marak commented 8 years ago

Additionally, I have researched the idea of persistent outgoing connections, which would require some major changes to the core spawning logic.

see: https://github.com/bigcompany/hook.io/issues/113

reimertz commented 8 years ago

Ok, good point and thanks for your answer. Will do some more research/gather more background knowledge before asking questions in the future.

Marak commented 8 years ago

Cool. Thank you for the feedback. I'm glad we found this now, rather than later.

The entire project is open-source and a lot of the core functionality has already been broken into separate modules. Check out the dependency tree a bit. You might find something we can improve on.

Marak commented 7 years ago

This has been added, and I've set a fairly high number of concurrent services for active users.

I'll be monitoring the concurrency numbers for services and will soon be changing the total amount of concurrent services allowed based on the account's subscription plan.

The main reason I didn't go live with the hard-limits is due to the fact that there are a few ways the limits could potentially go off, and I didn't want to unintentionally rate limit any users. Will be going live with limits sometime in the next few weeks.

Marak commented 6 years ago

We've got concurrency limits live for sometime and they are mostly working. Users now have the ability to unlock their accounts on request in case of rate limiting.

I'm now working on improving the UX for visibility of rate limits and service usage.

Closing this issue as resolved.