vsivsi / meteor-job-collection

A persistent and reactive job queue for Meteor, supporting distributed workers that can run anywhere.
https://atmospherejs.com/vsivsi/job-collection
Other
388 stars 68 forks source link

[Question]: How to enable concurrency / multithreading / vertically scale over time? #248

Closed joncursi closed 7 years ago

joncursi commented 7 years ago

I have a meteor app hosted on galaxy that is based in the cryptocurrency space. I need to fetch the price of every coin on my system, every 5 minutes or less.

As far as total number of coins: let's say I have 1,000 coins right now on the platform, but in a year from now, 2,000, and the year after that, 5,000. The number of coins is growing exponentially.

As far as timing: let's say that my SLA with users is that the price will be no more than 5 minutes out of date. This is a fixed value that cannot get larger over time. So this means I need to run...

In other words: I need to figure out a way to scale.

I initially built a naive price refresh implementation directly within my Galaxy app, but quickly realized that this was a bad idea because it blocks the responsiveness of the actual application for my end-users due to the jobs consuming all of the server resources. So I started investigating microservices architecture, and arrived here :)

This package looks great, but I can't find much information about scaling for the future / how to enable multithreading / job concurrency.

My thought process here is to create a dedicated microservice project that has the sole purpose of using this package to refresh every coin price. In order for this to scale for my needs, it's inevitable that I'll need to be multithreaded so that multiple jobs are running in parallel to get the overall goal of X jobs completed in under 5 minutes. In 5 years from now, when there are 20,000 coins to refresh, there's no way that will complete in under 5 minutes on a single threaded event loop!

So this leads me to the topic of how to scale. I'm thinking that vertical scaling is the way to go, since this is mainly CPU intensive, so I could just throw more CPUs at it.

My general questions around scaling are this:

  1. Am I correct in assuming that vertical scaling the way to go?
  2. How do I instruct this package to take advantage of multiple cores on the server / how to spin up multiple workers? Is there a count / number I can configure? Does it auto-detect? Do I need to use something like meteorhacks:cluster? Can I spawn multiple workers on just a single core server (i.e. a Galaxy compact container)? If I were to spawn multiple Galaxy compact containers, would they be able to coordinate with eachother to get the work done without stepping on eachother's toes, or is it better to stick with one Galaxy container and just increase it's size vertically?
  3. This brings me to my third point. A lot of these packages are meteor-specific, but I'm wondering if I should even base this on the Meteor framework since it's just a back-end process stuffing new data into MongoDB with no UI. Your thoughts on using the meteor-job NPM package directly, and how to scale that vertically outside of Meteor?
  4. Would you recommend Galaxy for this use-case (assuming Meteor was the framework), or is it a better idea to use a dedicated server (i.e. a DigitalOcean droplet or Linode) since you can buy a lot more CPU for the price and High-Availability is not a concern since this is just a microservice the end-user will never access directly?

Sorry for the long and drawn-out post, I'm just trying to get the full picture of what to expect before I commit to going down this path.

janat08 commented 7 years ago

This doesn't deal with multi threading. It makes sure that if you have multi servers they can tell what is dealing with what coin price update so that they don't do it more than once. So yes galaxy offering of several core servers is ridiculous as nowhere they're suggesting that they run dockers which is like meteor cluster without the bad rep of not being worthy of production. If you have autoscale for your deployment feel free to just host work on the same meteor servers although node.js server improvements don't get immidiately ported over to meteor. I imagine one of the issues that this package resolves is that if user subscribes to data for a server and that server then begins doing work, publication won't update.

vsivsi commented 7 years ago

@joncursi There's a lot in your question for a github issue thread on a package like job-collection. You are essentially asking for a design consultation for your app, which isn't really an "issue" 😉

I really can't comment at all on the appropriateness of deploying your Meteor process(es) to Galaxy for your app... You are going to need to do your own research on that.

For worker scalability my advice is to just use vanilla node.js with the meteor-job npm package.

You don't need 98% of the Meteor server runtime to operate a worker. You will need to provision your own machines (bare-metal, EC2, etc) and maintain your own DDP connections from the worker processes to the Meteor server. Node.js is inherently single threaded, but it is no big deal to run N node.js processes on an N core machine. And if your workload is highly asynchronous (with lots of DB queries, Network API calls, or other disk or network I/O) you can configure each worker to efficiently handle multiple simultaneous jobs per worker process (if it is compute heavy, then just leave it at 1 job per process).

Then scaling simply becomes a matter of adding more worker processes (on more machines) over time as your application/user base grows. You will probably want to look into a cluster management solution (e.g. Consul/Nomad or Etcd/Kubernetes) for managing your worker instances. But beyond mentioning it, that is outside the scope of support for a package like this.

Hope that helps somewhat.

joncursi commented 7 years ago

Thank you, that actually helps a lot and was the sort of information I was looking for! On Tue, Jul 11, 2017 at 7:34 PM Vaughn Iverson notifications@github.com wrote:

@joncursi https://github.com/joncursi There's a lot in your question for a github issue thread on a package like job-collection. You are essentially asking for a design consultation for your app, which isn't really an "issue" 😉

I really can't comment at all on the appropriateness of deploying your Meteor process(es) to Galaxy for your app... You are going to need to do your own research on that.

For worker scalability my advice is to just use vanilla node.js with the meteor-job npm package.

You don't need 98% of the Meteor server runtime to operate a worker. You will need to provision your own machines (bare-metal, EC2, etc) and maintain your own DDP connections from the worker processes to the Meteor server. Node.js is inherently single threaded, but it is no big deal to run N node.js processes on an N core machine. And if your workload is highly asynchronous (with lots of DB queries, Network API calls, or other disk or network I/O) you can configure each worker to efficiently handle multiple simultaneous jobs per worker process (if it is compute heavy, then just leave it at 1 job per process).

Then scaling simply becomes a matter of adding more worker processes (on more machines) over time as your application/user base grows. You will probably want to look into a cluster management solution (e.g. Consul/Nomad or Etcd/Kubernetes) for managing your worker instances. But beyond mentioning it, that is outside the scope of support for a package like this.

Hope that helps somewhat.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/vsivsi/meteor-job-collection/issues/248#issuecomment-314600990, or mute the thread https://github.com/notifications/unsubscribe-auth/AEvoEYjyRx9BNs8DWgWzj0swdjuYlEhEks5sNAZ5gaJpZM4OTjHJ .

joncursi commented 7 years ago

@vsivsi I started looking at AWS Lambda as a possible alternative to managing a cluster of EC2s, docker containers, or Galaxy containers. Theoretically, 1,000 job created from the application server could trigger 1,000 lambda functions to run in parallel, and have the entire job done in the time it takes for the longest request to come back... seconds. Have you tried using lambda with meteor-job-collection? I've been doing some research on this and I'm trying to figure out a clever way to trigger a Lambda function from within Meteor / meteor-job-collection. Essentially, creating 1,000 unique jobs, and triggering 1,000 Lambda functions of the same type to run in parallel. Does this sound like a viable approach to you?

vsivsi commented 7 years ago

I haven't. I've looked into Lambda for other applications, and it is really at its strongest for event driven, highly asynchronous tasks. And it specifically doesn't work for one of the main uses for job-collection, namely, longer running, heavy compute jobs. IIRC, a lambda invoke needs to complete within 5 (CPU?) minutes, and also can't consume very much memory (don't recall the limit). So, it's really a highly scalable micro-services architecture.

You can trigger a lambda function with a web API request to an HTTP endpoint you define, so a job collection worker (say on a Meteor server) can trivially invoke hundreds of simultaneous lambda function invocations, each with a simple HTTP request, and then asynchronously wait for the response before succeeding or failing the job in job-collection. The lambda function can either return the result data for the worker to write, or it can write it itself and just return status.

Lambda functions can also be "scheduled" to periodically run (ala CRON) entirely within the AWS infrastructure, no job-collection required. But then you'd need to setup a database observe or web-hook on the application server side to be notified when results are ready.

Honestly there are a hundred ways to do everything these days, and so you just need to get crystal clear about your application requirements, and then evaluate a bunch of possible solutions until a clear front runner emerges.

FWIW, I shy away from Lamba (and lots of other "cloud" value-add services) because of serious concerns about vendor lock-in. But YMMV depending on your requirements, business model, etc

joncursi commented 7 years ago

@vsivsi awesome, thank you very much for your sharing your experience and wisdom. After doing some more homework on this, I do think it is suited well for my particular use-case since I have lots of tiny jobs that need to complete in a short amount of time that are mostly asynchronous (waiting on HTTP requests, and shoving the result into Mongo). So I'm planning to use that in conjunction with meteor-job-collection, essentially every 5 minutes meteor-job-collection will create a new job, pass it to a worker, and the worker's job is simply to loop over every coin in the database and invoke a Lambda function N times. So 1,000 coins -> fire off 1,000 Lambda URL requests with unique parameters -> done.

It feels really elegant typing it out like that, and in my head, so I'm hoping the implementation is just as nice! The biggest sell for me is not having to worry about configuring or managing containers and clusters, and being able to "easily" scale up over time as demand grows. Plus it looks cheap! I've really been loving the "NoOps" approach to life ever since I moved from Linodes to Meteor Galaxy, and Lambda really seems to fit into that realm! :)