safe-refuge / safeway-data

Data mining tools for the Safeway app
4 stars 4 forks source link

Update PointOfInterest schema #21

Closed littlepea closed 2 years ago

littlepea commented 2 years ago

PointOfInterest schema changed and the actual database now has more fields, so we need to update our model accordingly:

New schema example:

{
      "phone": null,
      "email": null,
      "url": null,
      "socialmedia": null,
      "messanger": null,
      "name": "Művelődési Ház",
      "description": "You can apply for refugee status here",
      "categories": [
        "Information"
      ],
      "organizations": [
        "Hungarian state"
      ],
      "open_hours": null,
      "tags": null,
      "icon": null,
      "approved": true,
      "active": true,
      "address": null,
      "city": null,
      "country": "Hungary"
}

Then we need to make sure that data converts correctly from the source spreadsheet.

P.S. once this task is done we'll need to update all the spiders.

moorchegue commented 2 years ago

I'd take this! Although I might need some explanation on how the data flow functions, and perhaps setting it up with right env vars.

What's the overall data flow? We scrape things with spiders, store it as CSV, right? Do I understand correctly that the source of truth (the database) is a Google spreadsheet though? What's the relationship between these? Is there a sync mechanics both ways? Do I need to give you my Google account info to access it? Is there a way for me to set up a testing sheet, or even better an offline version?

Hope this is not too many questions, and answering them is not more time consuming than writing the actual code for this issue :)

littlepea commented 2 years ago

@moorchegue about the data flow answered in the readme: https://github.com/littlepea/safeway-data/blob/master/README.md#data-flow

Is there a sync mechanics both ways?

No, it's one way, from spreadsheet/CSV to CSV

Do I need to give you my Google account info to access it?

No, just ask me for the API key in private.

Is there a way for me to set up a testing sheet, or even better an offline version?

You can use the real sheet, no need for testing, coz we access it read-only anyway

littlepea commented 2 years ago

@moorchegue can we finalize this PR? I need to rerun all the spiders with the new schema so that we can load the points into the app

moorchegue commented 2 years ago

Sure, let me get back to this tonight. I think there's only one outstanding issue left here.