hackforla / food-oasis

Repository for the current redevelopment of the Food Oasis Los Angeles website
https://foodoasis.la
GNU General Public License v2.0
72 stars 51 forks source link

Review all scraped data to determine if we can normalize it #242

Closed ExperimentsInHonesty closed 3 years ago

ExperimentsInHonesty commented 4 years ago

Overview

We have scraped a lot of data from various website. We need to see if it's possible to normalize it into one data set, from which to verify the listings

Action Items

Please add your availability for a zoom meeting to the comments.

Resources/Instructions

Scraped resources google drive folder

chrislopez28 commented 4 years ago

I'm tied up this week until Feb 2, but after that I'm pretty open.

I'm available Mondays 9AM-4PM; and weekday nights 8:30PM-10:30PM.

ExperimentsInHonesty commented 4 years ago

@chrislopez28 I can do Monday at 10am if that works for you. But I think @chombus is not available during the day. But I think we can review this without him for now. I'll reach out to you on slack to confirm your availability for this Monday.

chrislopez28 commented 4 years ago

Experimenting with matching potential duplicate entries between scraped files.

Pantries Food Finders and LMS. Closest entry by haversine ("as the crow flies") distance in meters: https://github.com/chrislopez28/fola-data-normalization/blob/master/export/pantries_dist.csv

Farmers Markets CalFresh and LMS. Most similar name string by finding entry with minimum levenshtein distance: https://github.com/chrislopez28/fola-data-normalization/blob/master/export/markets_stringdist.csv

ExperimentsInHonesty commented 4 years ago

See these links for references about what type of information we are collecting stakeholder details - part of current design efforts: https://github.com/hackforla/food-oasis/issues/178#issuecomment-565886143

AIRS https://github.com/hackforla/food-oasis/projects/6

ExperimentsInHonesty commented 4 years ago

see jobs-for-hope repo backend for scraper examples

ExperimentsInHonesty commented 4 years ago

Discussion between @chrislopez28and @ExperimentsInHonesty on, determined that we need more data for him to work with. So he is going to get some scrapers working, starting with issue #95. We will come back to this issue, once more of the scrapers are active.

chrislopez28 commented 4 years ago

Progress:

To Do: