marzeelabs / cordis-serverless

A serverless API for EU Cordis data
5 stars 2 forks source link

Store enriched projects in DynamoDB #1

Open pvhee opened 7 years ago

pvhee commented 7 years ago

We need to store projects in a NoSQL database like dynamoDB, enriched with:

In order not to timeout the lambda functions, we might need to use SQS to queue up all projects to be enriched (from the original xls data), process them with a lambda function, and store them enriched in our data storage.

pvhee commented 7 years ago

Getting some insight into timeouts and db throughput tweaking via https://medium.com/@CodingJoe/dealing-with-dynamodb-write-capacity-limits-and-lambda-timeouts-f4e08d9f4b4f#.beltqobit and https://hackernoon.com/top-5-lessons-learned-from-trying-to-build-my-own-serverless-website-d65a168c5e6d#.ylm7i7xa2

This way - we can avoid complicating our architecture with SQS and stick with a DynamoDB. We could probably rate limit writing to dynamoDB via https://github.com/jhurliman/node-rate-limiter

Geo-annotations can happen via streaming to lambda's, following https://aws.amazon.com/blogs/aws/dynamodb-update-triggers-streams-lambda-cross-region-replication-app/

pvhee commented 7 years ago

After playing around with various different ways of dealing with DynamoDB, dynamodb-wrapper seems the most fit to write a lot records without having to have massive write throughput: https://www.npmjs.com/package/dynamodb-wrapper

pvhee commented 7 years ago

Updated plan is to: