livgust / macovidvaccines.com

macovidvaccines.com
MIT License
38 stars 20 forks source link

Filtering by location #3

Closed livgust closed 3 years ago

livgust commented 3 years ago

I don't have a particular solution in mind, but I know a standard one would be to have a ZIP filter and then have a radius of X miles.

dldereklee commented 3 years ago

Can add the Distance Matrix API from Google, but would need to have geo data (lat, long) for each location from the scrapper

linusmarco commented 3 years ago

The distance matrix API does allow you to pass addresses. Although it would be more reliable if the back end returned lat/lon for each location.

https://developers.google.com/maps/documentation/distance-matrix/overview

johnhawkinson commented 3 years ago

Can add the Distance Matrix API from Google, but would need to have geo data (lat, long) for each location from the scrapper

We need it for other purposes (Visualization, see #12 ). My intention was to cache it in a separately maintained geo.json that clients would download, and just periodically geocode new site addresses, because new sites are rare.

But maybe there's a better architecture.

linusmarco commented 3 years ago

Until we get the vaccination site geocoding implemented on the back end, we could add a simple zipcode (or maybe county, which we can back out from zipcode) filter.

dldereklee commented 3 years ago

If you have the street and the zip of the location, the geocoding service is pretty stable. Can add it to the scrapper which currently gets street and zip.

const { Client } = require("@googlemaps/google-maps-services-js");

const getGeocode = async (street, zip) => {
  const address = `${street},${zip}`;
  const client = new Client();
  try {
    const resp = await client.geocode({
      params: {
        address,
        key: process.env.GOOGLE_API_KEY,
      },
    });
    return resp.data;
  } catch (e) {
    console.error(e.response.data);
  }
};

Response

{
    "results": [
        {
            "address_components": [REDACTED],
            "formatted_address": "153 Chestnut St, Southbridge, MA 01550, USA",
            "geometry": {
                "bounds": {
                    "northeast": {
                        "lat": 42.0687815,
                        "lng": -72.0273536
                    },
                    "southwest": {
                        "lat": 42.0682489,
                        "lng": -72.02792219999999
                    }
                },
                "location": {
                    "lat": 42.0685246,
                    "lng": -72.0276497
                },
                "location_type": "ROOFTOP",
                "viewport": {
                    "northeast": {
                        "lat": 42.0698641802915,
                        "lng": -72.02628891970849
                    },
                    "southwest": {
                        "lat": 42.0671662197085,
                        "lng": -72.0289868802915
                    }
                }
            },
            "place_id": "ChIJO63BQemh5okR5NchkhUJ038",
            "types": [
                "premise"
            ]
        }
    ],
    "status": "OK"
}
johnhawkinson commented 3 years ago

Geocoding costs (small) money and is slow. Better to cache it than to query it every time in each scraper.

dldereklee commented 3 years ago

Agreed. Caching is definitely the way to go. I was suggesting to run it on only newly scraped addresses.

Although if there aren't many sites being added, you could just do this manually as you suggested.

dldereklee commented 3 years ago

Assuming we can get lat/long into the scrapper outputs, I have this: https://github.com/dldereklee/macovidvaccines.com/commit/3b14c4ccfe564b85c1e589d657c41144e6e2f7d6

Note: The distance right now is just calculated from a random lat/long

image

johnhawkinson commented 3 years ago

Let's assume for the moment that the client imports a separate file, say geo.json, indexed by results[].name, that looks like this. with the results from Google's geocoder.

Can we trust that name is a unique key?

(In the alternative, perhaps we can cut it down to just the long/lat, rather than the full geocoder output.)

Example geo.json:

{
      "Hannaford (Uxbridge)": [
        {
          "address_components": [
            {
              "long_name": "3",
              "short_name": "3",
              "types": [
                "subpremise"
              ]
            },
            {
              "long_name": "158",
              "short_name": "158",
              "types": [
                "street_number"
              ]
            },
            {
              "long_name": "North Main Street",
              "short_name": "N Main St",
              "types": [
                "route"
              ]
            },
            {
              "long_name": "Uxbridge",
              "short_name": "Uxbridge",
              "types": [
                "locality",
                "political"
              ]
            },
            {
              "long_name": "Worcester County",
              "short_name": "Worcester County",
              "types": [
                "administrative_area_level_2",
                "political"
              ]
            },
            {
              "long_name": "Massachusetts",
              "short_name": "MA",
              "types": [
                "administrative_area_level_1",
                "political"
              ]
            },
            {
              "long_name": "United States",
              "short_name": "US",
              "types": [
                "country",
                "political"
              ]
            },
            {
              "long_name": "01569",
              "short_name": "01569",
              "types": [
                "postal_code"
              ]
            }
          ],
          "formatted_address": "158 N Main St #3, Uxbridge, MA 01569, USA",
          "geometry": {
            "bounds": {
              "northeast": {
                "lat": 42.0829484,
                "lng": -71.63740469999999
              },
              "southwest": {
                "lat": 42.0811269,
                "lng": -71.64024049999999
              }
            },
            "location": {
              "lat": 42.0817647,
              "lng": -71.6387819
            },
            "location_type": "ROOFTOP",
            "viewport": {
              "northeast": {
                "lat": 42.0833866302915,
                "lng": -71.63740469999999
              },
              "southwest": {
                "lat": 42.08068866970851,
                "lng": -71.64024049999999
              }
            }
          },
          "place_id": "EikxNTggTiBNYWluIFN0ICMzLCBVeGJyaWRnZSwgTUEgMDE1NjksIFVTQSIdGhsKFgoUChIJ77_67ZMT5IkRDoiurmKucsUSATM",
          "types": [
            "subpremise"
          ]
        }
      ]
}
ramon-h commented 3 years ago

A slightly less accurate, but quick and easy solution that you could consider would be to use a lookup table for distances between zip codes in the Massachusetts. There's a csv here where you can get all zip code pairings within 50 or 100 miles of each other. The file is pretty big by itself, but if you trimmed out zip codes outside of Massachusetts it would be much smaller. It has the benefit of not creating an external dependency nor any work to creating a caching mechanism.

dldereklee commented 3 years ago

@ramon-h I implemented the destination lat/long (https://github.com/livgust/covid-vaccine-scrapers/pull/72) so the only thing we need now is the source lat/long which can be approximated by the browser navigator.geolocation or via user-inputted zip code to calculate user distance to the sites.

The only external dependency we have now is to the opendatasoft API which converts the user inputted zip code to a corresponding lat/long coordinate. If we wanted to remove that dependency we could just download the table from opendatasoft which is about 536 rows for MA.

harcod commented 3 years ago

I'm also working on this. @livgust asked me to use the "us-zips" library

livgust commented 3 years ago

UX specs: image image