lefnire / jobpig

Matchmaking Job Board
GNU Affero General Public License v3.0
21 stars 8 forks source link

Improve location tag #1

Open lefnire opened 8 years ago

lefnire commented 8 years ago

All job attributes are stored as Tags, including location. Currently location is saved simply as a string, eg "San Francisco, CA". There are two problems:

  1. These strings aren't normalized, they're scraped per job and can be anything. If you currently search "San..." you'll see "San Francisco, CA", "San Francisco", "SF/Bay Area", "SF, CA, USA", etc. So we need to normalizing locations (so "San Francisco, CA" can only ever be one location tag).
  2. There's no locational awareness for radius search preferences (no lat/lng information).

I've looked into Google maps geocoding API, and other geocoding APIs; we can't use these due to terms issues (will explain if desired). Luckily we're using Postgres, which has a PostGIS Tiger Geocoder extension for just this purpose. I had a helluva time setting it up; and I'm if I'm not mistaken it only applies to USA? If I'm wrong, and we can set it up, we could store location tags as lat/lng tuples for (1) location normalization; (2) location radius scoring.

But let's punt Geocoder for later, and as a short-term solution simply dump all world cities into our database via the Adwords cities csv (creative commons). Then we'll prevent creating any new location tags, since they're all there. This will solve the normalization issue (not the radius issue).


Some technical notes for using adwords.csv:

We'll want to filter out too-small administrative divisions (see "Target Type"). I'm not sure which ones to use besides City, Country, Province, State (ideas?)

Process: (1) parse / filter the csv (see above); (2) upload to database (returning values); (3) store the results (along with id) to locations.json; (3) copy/paste said to client, so the file can be used both by client & server. Reasons for this procedure:

lefnire commented 8 years ago

https://github.com/lefnire/jobpig/tree/lefnire/locations