Open olafveerman opened 8 years ago
For reference, my guess is that researchers will share location-specific data (perhaps in batches of different locations). I think most typically they will have a txt or csv file spit out from a single instrument (that may measure 1 or more pollutants) at an individual geographic location. They will modify that single-instrument, single location file to fit the criteria we specify as csv to upload. Does that make sense? Anyone have other thoughts?
Don't know if we will have any form that captures data common to all datapoints from the same location (e.g. geographic coordinates, attribution, etc.) or if it is best to have the entire information repeated in the file for each data point?
@olafveerman server side logic can live with the uploader, but we'll need a different project for the admin panel.
@danielfdsilva what sort of server side logic do you think the uploader will need? The piece to get a signed url for S3 put should live in the API I think.
Just talked with @RocketD0g about the admin panel and I think we can not implement that at this point. I don't think the time/difficulty needed to implement will be worth the number of people expected to use it. The other sort of tricky bit here is that the database is now shared publicly. So I'm not sure how we'll store the private upload keys (maybe encrypted?) anyways. Then I thought maybe we could use a separate DB just for this, but that doesn't seem great either.
My recommended approach is to have an email link on the form where they can request a token (sent to info@openaq.org). I'll pregen 10 tokens or so that can be given back out to people. This seems like an easier approach for expected amount of usage and let's us focus on some of the other things.
Sound good to everyone?
@jflasher true that the token management system has a couple of endpoints, but we'd be reusing the OAM one, so the only work involved would be changing the style to fit openaq's. And yes, we'd need a mongo db to store the keys.
The pregen tokens is a simpler approach, but not as easy to manage
I think for now let’s stick with pregen. Definitely agree it’s not as easy to manage, but don’t want to have to manage the admin instance and a mongodb instance just for this.
On October 19, 2016 at 8:30:10 AM, Daniel da Silva (notifications@github.com) wrote:
@jflasher true that the token management system has a couple of endpoints, but we'd be reusing the OAM one, so the only work involved would be changing the style to fit openaq's. And yes, we'd need a mongo db to store the keys.
The pregen tokens is a simpler approach, but not as easy to manage
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Ticket to track and discuss the overall scope of openaq-upload v1 (still looking for witty project name). This uploader will live on something like http://upload.openaq.org and provide researchers and scientists with a way to bulk upload measurements in CSV format.
Authentication
Data can be uploaded by people with a Bulk Upload Key. These keys are managed by the OpenAQ moderators. We will implement a light system to manage keys and associated metadata like email address. This will be heavily borrowed from OAM
@danielfdsilva Do we need to set up a separate project for this?
Upload workflow
The main upload workflow will be:
Behind the scenes
The CSV file will be treated as any other source. A specific
batch_upload
adapter checks the S3 bucket every 10 minutes. If file detected, adapter processes and stores data. On success, will send an email to associated person.Comments, questions? @jflasher @ascalamogna @danielfdsilva @nbumbarger @RocketD0g