waldoj / frostline

A dataset, API, and parser for USDA plant hardiness zones.
https://phzmapi.org/
MIT License
149 stars 25 forks source link

Create a GeoJSON processing pipeline #32

Open waldoj opened 7 years ago

waldoj commented 7 years ago
waldoj commented 7 years ago

http://www.prism.oregonstate.edu/projects/public/phm/phm_us_shp.zip http://www.prism.oregonstate.edu/projects/public/phm/phm_ak_shp.zip http://www.prism.oregonstate.edu/projects/public/phm/phm_pr_shp.zip http://www.prism.oregonstate.edu/projects/public/phm/phm_hi_shp.zip

ogr2ogr -f GeoJSON -t_srs crs:84 phm_us_test.geojson phm_us_shp.shp

waldoj commented 7 years ago

This isn't working. The shapefiles use polygons, reasonably, so instead of having one file per point, we have one file per polygon. That's no good. It may be easier to go back to #31 and figure out how to get that projection working.

waldoj commented 7 years ago

I'm trying using SpatiaLite (the SQLite geodata extension), and loading the shapefile directly in there. The thinking is to then issue ~9M queries, one for each coordinate pair, creating a GeoJSON record for each. The catch is that each query for the metadata for a given point requires about 1.5 seconds, or 23 weeks of queries. This could be parallelized, but that hardly seems worth the trouble.

SpatiaLite supports spatial indices, but they're just not working for me. e.g.:

spatialite> SELECT CreateSpatialIndex('phz', 'Geometry');
CreateSpatialIndex() error: "no such table: geometry_columns"
0

There are no useful Google results when searching for information about this error. I've tried creating the table, but then it demands a particular columns, and presumably this would continue until I built an adequate table? Anyway, that doesn't seem like a functioning approach.

waldoj commented 7 years ago

I guess I could always go full AWS and use Postgres with PostGIS extensions, on RDS, to do this. It'd surely be faster. But I don't want to get that AWS lock-in.

waldoj commented 7 years ago

I'm trying out Postgres/PostGIS, using these data-loading instructions. That's gone fine so far (with the surprise requirement that I have to run CREATE EXTENSION postgis up front), but my efforts to perform a point-in-polygon query have so far failed.

waldoj commented 7 years ago

Ah-ha—this works:

SELECT gid, id, gridcode, zone
FROM conus
WHERE ST_Contains(
    geom, ST_Transform(
        ST_GeomFromText(
            'POINT(-78.2 38.5)', 4326)
        ,4269)
    )=true;

This is running 19–25 ms per query. That comes to 122 hours for 20 million queries, or just over 5 days, which seems pretty manageable.

Spot-checking a couple of queries, the geometry appears to be fine, presumably thanks to the 4326/4269 conversion.

I feel good about this.

webmaven commented 7 years ago

Any progress?

waldoj commented 7 years ago

No, this was going really badly, for reasons that I didn't document and cannot remember, though in my defense I was on some very powerful painkillers for the entire week, and it's kind of amazing that I wrote any useful code at all?

Coincidentally, I was working on this again last night, returning to where I was at with #31 before I closed it. I think the proper thing to do is to solicit help in adjusting the projection that doomed that approach, instead of moving to an entirely different approach. I'll have to set up some test data, document the problem, and tweet about it, in hopes that some clever geodata folks can propose a solution.

webmaven commented 6 years ago

I'll have to set up some test data, document the problem, and tweet about it, in hopes that some clever geodata folks can propose a solution.

Any progress?