Rewrite this server as a task that writes to storage

mozilla / ensemble-transposer

A Node task which reformats and adds metadata to raw data :musical_score: :pen:

https://ensemble-transposer.herokuapp.com/

Mozilla Public License 2.0

12 stars 7 forks source link

Rewrite this server as a task that writes to storage #81

Closed openjck closed 5 years ago

openjck commented 6 years ago

I've spoken with Frank and Blake about this approach. They both think it would be better than the current approach.

This project could be rewritten as an AWS Lambda function that writes data to S3 on a regular interval (say, every 24 hours). The advantages include:

DataOps could run it more easily
DataOps could run it more affordably
Performance would be better (it would process data independent from any HTTP requests)
The data could more easily be put behind a CDN

The downsides include:

This would prevent us from handling more complex requests. (Or at least it would make it impractical.) For example, we wouldn't be able to sort or filter data in response to query parameters.
- We don't do any of this now, but may want to in the future.

openjck commented 6 years ago

We should be sure that writing to S3 is atomic. We don't want to serve up a partially-written file. If the Lambda task fails, we should continue serving up the old file.

openjck commented 6 years ago

I spoke with Blake about this today. We agreed that it doesn't make much sense to spend time setting Lambda/S3 since DataOps is moving to GCP soon anyway.

Instead, we are going to continue to run this as a JSON server on Heroku for now. Blake will write a simple Lambda task that hits this server and stores the results in S3. ensemble can then hit that S3 bucket instead of hitting the server directly.

Once DataOps moves to GCP, we can do what this GitHub Issue actually describes by rewriting this as a Cloud Function that writes directly to Google Cloud Storage.

openjck commented 6 years ago

DataOps isn't ready to manage a GCP setup yet, but will be around the end of Q3 / beginning of Q4. I should drive that process myself, rewiring this project around that time, and open a bug against Data Platform and Tools :: Operations when we're ready for DataOps to manage it.

openjck commented 5 years ago

A significant amount of work has already been done on this. In the process of supporting Redash, I refactored some things which will make this work much easier.

Blake has asked me to rewrite this as a Google Cloud Function that writes to S3 (yes, S3—for now anyway) whenever I'm ready. The next step is for me to get that working locally. Once that's done, I can send him my branch and we can work together to get it hosted.

openjck commented 5 years ago

I spoke with Blake about this today. I'm planning to have this done by the end of February. This plan is still to have a Cloud Function that writes to S3.

I've pushed my progress here and I'm currently working on getting a GCP development setup.

openjck commented 5 years ago

Blake recommended using this code as a guide. This is the script that currently caches ensemble-transposer data on S3. It's a good example of using the AWS SDK.

https://gist.github.com/robotblake/04d3e0795a9c254af896ec317cf7a8dc