ohcnetwork / life

Verified Crowd-Sourced Emergency Services Directory
https://life.coronasafe.network/
84 stars 56 forks source link

Create a new Github action that will sync the data #167

Closed bodhish closed 3 years ago

bodhish commented 3 years ago

Create a new Github action that could fetch the JSON from our public bucket and store it in the data folder.

Sample data: https://life_data.coronasafe.network/oxygen.json

vpremk commented 3 years ago

Wooo I can look at Python code

vpremk commented 3 years ago

@bodhish Do we have S3 bucket that I can use or shall I create one for now to get going

bodhish commented 3 years ago

You may not need the bucket. The data will be written to the bucket(https://life_data.coronasafe.network/oxygen.json) via our app.

We want the data from the bucket to be saved to this folder. https://github.com/coronasafe/life/tree/main/data (we will also know the urls for each file)

Example of GitHub action https://github.com/coronasafe/life/blob/main/.github/workflows/update_json.yml

The Python script should do 2 things.

  1. download and write all files (assume multiple links) to the repo
  2. Generate active_districts_v2.json from the loaded data.

@vandanabhandari would be happy to answer if you have more questions 😄

vpremk commented 3 years ago

@bodhish I'm not clear on the "bucket", do you mean AWS S3 bucket? The link you gave doesn't look like one https://life_data.coronasafe.network/oxygen.json

I'm using S3 BOTO3 lib of python and using list=s3.list_objects(Bucket='BUCKET_NAME')['Contents']

1) If it is multiple links, can I assume all the urls are https://life_data.coronasafe.network/*.json 2) If all the data is in https://life_data.coronasafe.network/oxygen.json and this files need to be generated from https://life_data.coronasafe.network/oxygen.json then I will parse the data in oxygen.json

bodhish commented 3 years ago

Listing the bucket is disabled with policy. I was thinking we could do a fetch request. We can update the list of links at a later stage. You can assume it as an array of links.

You are right that the bucket is currently digital Ocean, will be switching to s3 with a cloudfront layer on front of it at a later state. (that's the reason I was thinking we should use http fetch so that we don't have to update code if we change source at any point )

vpremk commented 3 years ago

got it, so I will use request lib to from a list of links , the list hardcoded in the python file like

  1. Initialize list list_of_file

list list_of_file = [https://life_data.coronasafe.network/oxygen.json, https://life_data.coronasafe.network/xyz.json, https://life_data.coronasafe.network/abc.json]

  1. Parse through the list
  2. Download oxygen.json, xyz.json, abc.json in data/oxygen_v2.json, data/xyz_v2.json, data/abc_v2.json
  3. To generate active_districts_v2.json, should I parse it from oxygen.json similar to what is in scrapper.get_active_district_data?
bodhish commented 3 years ago
  1. You just have to download and save the file to data/oxygen_v2.json

When we add vaccine link to the array (assume the file name will be vaccine.json) it should create vaccine_v2.json

bodhish commented 3 years ago
  1. true
  2. true
  3. true
  4. I am not sure what that is but what we are trying to do is, we want to create a JSON that says for a district_x we want to know what all resources are available.

Example: In oxygen.json kottayam, and vaccine_v2.json we have kottayam

Example output weill be

[
     {
            "ambulance": false,
            "contact": false,
            "doctor": false,
            "helpline": false,
            "hospitals": false,
            "medicine": false,
            "oxygen": true,
            "vaccine": true
            "district": "kottayam",
            "state": "kerala"
        }
 ]    
vpremk commented 3 years ago

oh, looks like https://life_data.coronasafe.network/oxygen.json is now AccessDenied ` AccessDenied

life-data tx0000000000000118979e8-0060887bfa-131f0da-sgp1b 131f0da-sgp1b-sgp1-zg02

`

bodhish commented 3 years ago

@vandanabhandari we just deleted the data. It should be live in 10 minutes (infrastructure switch). I have shared a copy of the file over slack.

bodhish commented 3 years ago

@vandanabhandari It's back. The link should be working now

vpremk commented 3 years ago

https://github.com/coronasafe/life/pull/172