Closed mathcolo closed 10 months ago
I think there's also another option d) Upload from disk at a steady interval that is close to live
Oh, yeah, lol—that could work! if you do that, please keep it separate from the MBTA-provided events (maybe use a new key, Events-ours
instead of Events
inside the bucket?). Also could be handy to keep track of uploaded sha256 hashes so things that aren't changed don't get uploaded again.
Also e) Dynamo?
@hamima-halim @mathcolo I won't have time in the next few weeks to work on this so feel free to grab it. Personally I think s3 upload will be the easiest to work with on the dashboard side
I can take this! PR incoming, will do the extremely chill thing of 30-minute-interval scheduled upload jobs.
So right now events are being written to disk on the instance. The data dashboard needs to get a hold of them somehow, though. I see a few distinct options...
a) Upload the events files to S3 overnight every night, accepting that we just won't have live bus or CR. (lame) b) Every time we append an events.csv on disk, upload the entire thing to S3. (maybe, but like...no) c) Serve live events over http that the dashboard can request on-demand.
My hunch is that we want to do (c), with some (a) sprinkled in. It's cool when things are live, and we shouldn't give that up. So rough steps:
express
server that serves up events from./output
..labs
DNS record with the wildcard cert for https.FAQ a) Why cannot the dashboard talk to the EC2 instance via its private IP, such that we can keep the EC2 instance off the public internet? That's possible, but it's a pain b) If the EC2 instance has a public IP address, can the dashboard lambda just talk to that? Yes, it could. But the load balancer option lets us easily add https using the wildcard cert, which, even with no private data involved is good citizenry.