Open aalex opened 5 years ago
When it receives a HTTP POST is launches a Docker. Instead, we would launch many Fargate tasks. We will then be able to launch many at the same time. The task service is the only one that is not a fargate service. (Fargate runs docker serverless)
As data release coordinators, we want the ETL to be as fast as possible.
Acceptance criteria
Technical discussion
Parsing all studies could be 10 minutes instead of 3h.
To run them in parallel, we could use AWS Fargate (or later AWS Lambdas) instead of a series of Docker containers who run in a large E2 instance, and that doesn't scale in/out. (so its yearly cost is higher)
Changing to AWS Lambdas would require more effort than changing to AWS Fargate, because it would mean removing Spark.