mapbox / preprocessorcerer

Perform preprocessorcery and pick parts on particularly persnickity uploads
ISC License
12 stars 8 forks source link

[WIP] Add geojson-stacker preprocessor #72

Closed davidtheclark closed 5 years ago

davidtheclark commented 8 years ago

Very much a WIP ATM.

The goal here is to incorporate stacker-of-worlds to transform GeoJSON such that all coordinates fall within longitudes +-180. The ultimate goal of this is to prevent breakdowns in other tools further down the processing pipeline.

I am facing a grave difficulty, though, and seek the counsel of this library's maintainers.

Right now tests fail in a mysterious way. The source of the failure seems to be that, within the bowels of stacker-of-worlds, when Turf tries to perform an intersection calculation it fails on one of the existing test fixtures. JSTS, behind Turf, throws an old classic unable to assign a hole to a shell error` (?). I looked that up and see that it probably means something is slightly wonky about the geometry — just wonky enough to break the calculation but not, apparently, break other processing.

So this raises the question: If the geojson-stacker preprocessor is going to fail on some geometries, because they're invalid GeoJSON or we can't assign holes to their shells, etc. —— what do we do? We could just skip that preprocessing step; but then we end up with inconsistently processed end results. What do we tell the user? Can't assign a hole to your shell?

Thoughts?

davidtheclark commented 8 years ago

cc @rclark (anybody else you think I should cc?)

rclark commented 8 years ago

If the geojson-stacker preprocessor is going to fail on some geometries ... what do we do?

The unfortunate reality of this processing chain is that we have to accept everything that gets thrown into it. The expectation is that the preprocessorcerer makes non-lossy data transformations, with the primary goal of improving mapnik performance when it comes time to cut the data into tiles.

As I understand this PR will just skip features that it won't be able to process? That seems like it would make downstream handling of out-of-bounds data even more confusing.