Closed reimertz closed 6 years ago
Still not 100% sure what happened on production yet. Having long custom timeouts set for services which never end seems like it could have been the problem.
I think the most correct solution might be to enable a concurrency limit on the amount of times a service can be executed at the same time. In other words, the same service could only have N
instances of it running at once. In the majority of cases, this should not come up. If a service has a high custom time-out value and is repeatedly run and hangs, the concurrency limit should kick in.
Will continue to investigate.
Legit solution.
By the way, is each instance only used once and then shut down? Or could it be used again within it's given timeout if a there is a new request on the given hook url?
It might be easier on the server if instances are reused.
You'd really have to read into the documentation / blog posts / and source code to understand specifically how the hook service life-cycle works...especially if you'd like to comment on what might be easy versus hard...
When I say instance
in this context, I am referring to a single systems process running a single copy of a users' hook service. Every request is served by a one-time-use single operating system process.
To answer your question, the instance
is shut down every time. This is by design and works well. Re-using instances would create a stateful process, which can cause major headaches.
Check out https://hook.io/blog/the-monolith-versus-the-microservice-a-tale-of-two-applications
Spawn lifecycle can be found here: https://github.com/bigcompany/hook.io/blob/master/lib/resources/hook/spawnService.js#L15
Additionally, I have researched the idea of persistent outgoing connections, which would require some major changes to the core spawning logic.
Ok, good point and thanks for your answer. Will do some more research/gather more background knowledge before asking questions in the future.
Cool. Thank you for the feedback. I'm glad we found this now, rather than later.
The entire project is open-source and a lot of the core functionality has already been broken into separate modules. Check out the dependency tree a bit. You might find something we can improve on.
This has been added, and I've set a fairly high number of concurrent services for active users.
I'll be monitoring the concurrency numbers for services and will soon be changing the total amount of concurrent services allowed based on the account's subscription plan.
The main reason I didn't go live with the hard-limits is due to the fact that there are a few ways the limits could potentially go off, and I didn't want to unintentionally rate limit any users. Will be going live with limits sometime in the next few weeks.
We've got concurrency limits live for sometime and they are mostly working. Users now have the ability to unlock their accounts on request in case of rate limiting.
I'm now working on improving the UX for visibility of rate limits and service usage.
Closing this issue as resolved.
First, sorry for bringing down the entire service.
So, I wanted to check if web-sockets worked properly. So I booted up a simple ws-example with a 60 second timeout. What I accidentally did was to also set it as a cron job that spawned a micro-service each second. #stupid
This resulted in some major issues.
I guess a solution might be to have magic formula that calculates if the combination of cron job and timeouts is valid or not.