nextzen / metro-extracts

Extracts of OpenStreetMap data.
14 stars 4 forks source link

Maintain history on the cities.geojson file #3

Open iandees opened 6 years ago

iandees commented 6 years ago

@migurski suggested that I keep the history of the cities.geojson file. This is a great idea as there are a ton of different contributors to it.

Here's how I might do it:

  1. Check out the old repo to a new place
  2. Use git filter-branch from here to grab only the commits that touch the cities.json file into a branch.
  3. Add this stripped-out local repo as a remote to this repo (on my machine)
  4. git fetch that remote
  5. git branch repo1 remotes/repo1/master from here to create a new branch with the history from the cities.json file
  6. Run the .json to .geojson script and check that modification into this new branch, also renaming the file
migurski commented 6 years ago

Here’s a stripped-out version of the repository: https://github.com/migurski/metro-extracts/tree/migurski/prepare-nextzen

I ran git filter-branch twice:

I tried a run of this with --prune-empty set but found that it was too aggressive, and removed some past contributors’ commits.

iandees commented 6 years ago

👏 thanks for doing this!

drewda commented 6 years ago

Hi guys, FYI, after suggestions from @migurski on Twitter, @irees did this in https://github.com/interline-io/osm-extracts/blob/master/cities.json

How about we find a common place where both projects can source the same GeoJSON file? Slightly different pipeline internals and output formats -- but we probably have the same notion of metros/regions in mind.

iandees commented 6 years ago

Yea, good idea @drewda. I'm happy to use yours if you'd like. I do have plans to support polygons rather than just bounding boxes. Does the tool you're using support that? It's ok if not, maybe we could store the bounding box separately.

irees commented 6 years ago

@iandees Yes, I plan to support polygons. Currently I generate a simple bbox based on the polygon geometry, but I am going to add full polygon clipping support (it's using osmconvert). Once per day, after updating the planet, I spin up a temporary ~200 cpu cluster and do the extracts in about 30 minutes; working on a blog post describing it.

migurski commented 6 years ago

FWIW, I was pretty intentional about using bboxes instead of polygons in 2011: people were using extracts to do local renders, and we had a large extra chunk of space around each metro center to ensure that you could zoom out and still see a meaningful map. Having a big fringe or even merging neighboring metros (like Tijuana and San Diego) was a goal, rather than clipping polygonal areas restricted to specific town boundaries.

drewda commented 6 years ago

I'm happy to use yours if you'd like.

@iandees sounds good. Feel free to edit the interline-io/osm-extracts readme, or I can do that, to link off to your extracts when they're ready to be made public. Also, if you need to access the GeoJSON file with Content-Type headers and all, we reverse-proxy it out at http://app.interline.io/osm_extracts/bounding_boxes.geojson (with 1 hour caching behind the scenes).

FWIW, I was pretty intentional about using bboxes instead of polygons in 2011

@migurski thanks for the background. Enforcing a bounding box makes a lot of sense when the primary application is tile rendering. For the routing engine graphs that @irees and I build from our extracts, we don't have those constraints. Still, it's hard to imagine situations that would warrant non-convex polygons or other odd shapes.

@iandees are there specific regions you have in mind for switching from bboxes to polygons?

iandees commented 6 years ago

are there specific regions you have in mind for switching from bboxes to polygons?

I don't have anything in particular need for it right now, but I wanted to let people submit pull requests for polygons.

Two thoughts:

  1. Metro areas like Minneapolis/St Paul that are more round than square. On the other hand, including the extra bits in the corners of a rectangle version of a metro area doesn't really increase the size of the extract that much.
  2. We could generate extracts for continents or countries.
drewda commented 6 years ago

@iandees sounds good. Feel free to open PRs against the cities.json whenever you want. I can also give you push access, so we have more hands to handle any incoming PRs.

drewda commented 6 years ago

Related: we now have CircleCI running JSON Schema validation, to decrease the chances of improperly formed GeoJSON entering either of our systems: https://github.com/interline-io/osm-extracts/blob/master/cities-schema.json