zerobase-io / smart-tracing-api

Backend/DB/API repository for the Zerobase platform
Apache License 2.0
6 stars 9 forks source link

Create an SQS queue for submitting analysis tasks #75

Open brianok77 opened 4 years ago

brianok77 commented 4 years ago

We need an SQS queue that will be used by the workers in #74 to get risk scoring tasks to perform. This queue will contain messages with payloads describing the task, including (a) which device we are assessing, (b) the identifier of the risk assessment measure to be run to generate the score, and (c) the timestamp of the request (used for storing the results).

NOTE: We need to make sure the SQS topic visibility timeout is sufficiently long. The default of 30 seconds may not be long enough, which will result in the workers doing repeat work.

NOTE 2: We should also configure a dead-letter-queue (DLQ) that will capture messages that cannot be processed. If it becomes a problem area, later we can add a monitor to the DLQ to generate notifications when messages are failing to process.

senthilmk commented 4 years ago

Is SQS the preferred choice or would any JMS compliant broker do? AWS ActiveMQ is another option is SQS quirks is not optimal for this use case ...

brianok77 commented 4 years ago

AWS suggests using SQS. If you have a compelling reason to use something else, please detail your concerns. Using SQS will cost significantly less for this use case (as I don't foresee us doing risk assessments continuously around the clock) and it will scale better for bursts of risk assessments should hotspots occur, as MQ does not autoscale (to the best of my knowledge).

From https://aws.amazon.com/amazon-mq/faqs/ .... Q: When should I use Amazon MQ vs. Amazon SQS and SNS?

Amazon MQ, Amazon SQS, and Amazon SNS are messaging services that are suitable for anyone from startups to enterprises. If you're using messaging with existing applications, and want to move your messaging to the cloud quickly and easily, we recommend you consider Amazon MQ. It supports industry-standard APIs and protocols so you can switch from any standards-based message broker to Amazon MQ without rewriting the messaging code in your applications. If you are building brand new applications in the cloud, we recommend you consider Amazon SQS and Amazon SNS. Amazon SQS and SNS are lightweight, fully managed message queue and topic services that scale almost infinitely and provide simple, easy-to-use APIs. You can use Amazon SQS and SNS to decouple and scale microservices, distributed systems, and serverless applications, and improve reliability.

senthilmk commented 4 years ago

SQS has its own quirks... message visibility is just one concern. It doesn't really comply with the requirements of a classic message broker.

There are other options to use to sync-ing tasks/task-statuses

brianok77 commented 4 years ago

What do you think about this @toadzky ? I still think SQS would be the better choice, but if everyone is more comfortable with legacy style brokers then I won't stand in the way. It will cost more for the foreseeable future given our analysis volume though.

toadzky commented 4 years ago

i'd probably just use sns/sqs. i'm not sure what concerns @senthilmk has about sqs that amazon mq would solve that aren't manageable. the visibility timeout seems like a minor thing to configure. to be honest, i'm not entirely sure why a message queue is involved at all. if all you are doing is triggering workers (which i assume aren't lambdas), why use a message queue at all? would a rest api work just as well? what are we expecting to populate the sqs queue with messages?

brianok77 commented 4 years ago

The design is to allow the number of workers to scale with demand. Each worker will run in a container and process a score from one subject at a time (up to N workers/container where N is TBD). The list of subjects desired to be scored are stored in the queue and the workers will take people off the queue and execute the scoring algorithm. We can autoscale this by monitoring the queue size and starting new containers when it exceeds a certain length (and likewise reduce containers when empty). WIthout a queue, we wouldn't be able to effectively scale and it would error out if all workers were busy.

The queue message will contain the subject to be scored and the algorithm to use. We will use a REST API to manage this, but the API will put messages onto the queue (returning a claim ticket or job id for later result retrieval), and allow retrieval of the results later (using the beforementioned claim ticket).