mapbox / mapnik-omnivore

Node module that returns metadata about spatial files.
45 stars 19 forks source link

Collect all feature's properties from GeoJSON file #162

Closed who8mycakes closed 7 years ago

who8mycakes commented 7 years ago

Implementing https://github.com/mapbox/mapnik-omnivore/issues/127.

This PR's objectives are:

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling b6814d39cfd00febf8bd180e6e845419ba682492 on more-features into on master.

who8mycakes commented 7 years ago

I used a very large GeoJSON file (1.3GB) to get a baseline for processing time before making changes to geojson.js. The result was: 1m32.501s.

After increasing num_to_features_query to 10000, the processing time increased to 1m44.331s, which took an additional 11.83s.

Running through processing a couple of times to account for cache-like behavior: (before = num_to_features_query = 5; after = num_to_features_query = 10000)

before after Delta
10s 850ms 12s 496ms 1s 646ms
10s 833ms 12s 983ms 2s 150ms
9s 910ms 11s 579ms 1s 669ms
9s 963ms 11s 474ms 1s 511ms
9s 921ms 11s 148ms 1s 227ms
Average 10s 295ms 11s 936ms 1s 641ms

Noting that the GeoJSON file I was benchmarking has 710,145 features, I increased num_to_features_query to 1M:

1M --- | 2m 40s 977ms 1m 34s 712ms 39s 481ms 40s 457ms 42s 411ms 1m 9s 917ms 1m 0s 780ms

who8mycakes commented 7 years ago

Pre-implementation of changes: start time [Fri, 03 Feb 2017 20:04:42 GMT] end time: [Fri, 03 Feb 2017 20:05:57 GMT] (1m 15s)

Here are the properties that were pulled from the test geojson file:


22 properties | This layer contains mostly Points
geo_longitude Number
Shape String
Height String
Site String
Wall String
City String
River String
Nickname String
Avenue String
Destination String
Lake String
Province String
Town String
Address String
Terrain String
location String
Country String
Title String
geo_latitude Number
Builder String
Square String
Boulevard String

These are the properties from the first 5 features, as the parameters are currently set.
coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling c74757d5487a69a30e859feea488c45c8a1bd6c8 on more-features into on master.

who8mycakes commented 7 years ago

@GretaCB As you review, I'd be interested to see if you think I could make this section of my test a bit more elegant: https://github.com/mapbox/mapnik-omnivore/pull/162/files#diff-50dbdad073e034de4a02c9507d0541cc

GretaCB commented 7 years ago

@who8mycakes great question. Not sure if you'll need quite as extensive of a setup as mapbox-upload-validate, but it uses an /expected dir to hold all expected json and other error strings. So you can store the expected json in a file, then use it like so, or whatever makes sense based on how you set it up.

who8mycakes commented 7 years ago

Here are the PRs I've set up for testing on staging:

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling b02137a4f6c10db788821ceb2caee8047d453cdf on more-features into on master.

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling 00191031752ff22503cbd23b3d3782d618de977b on more-features into on master.

who8mycakes commented 7 years ago

I tested the same fixture on staging, and met with success as all 35 unique properties of the features were captured:

start time: [Tue, 07 Feb 2017 23:18:21 GMT] end time: [Tue, 07 Feb 2017 23:19:47 GMT] (1m 26s)

cc @GretaCB @mapsam @springmeyer


35 properties | This layer contains mostly Points
Port String
Parliament String
geo_longitude Number
Shape String
Film String
Material String
Height String
Nation String
Site String
Capital String
Wall String
City String
Border String
River String
Nickname String
Avenue String
Destination String
Sea String
Abode String
Lake String
Province String
Town String
Neighbor String
Address String
Terrain String
location String
Ruler String
Country String
Title String
Ally String
geo_latitude Number
Builder String
Square String
Boulevard String
Group String
GretaCB commented 7 years ago

@who8mycakes woot! 🎉 Nice work 22 properties --> 35 properties

Are the processing times in staging pre/post inline with your benchmarking?

coveralls commented 7 years ago

Coverage Status

Changes Unknown when pulling de535db930e161491f5b8e39a3f75d9f367cf272 on more-features into on master.

coveralls commented 7 years ago

Coverage Status

Coverage remained the same at 95.247% when pulling f36822c185ff8cec721f09ab63e7d1316ff8d2a3 on more-features into bfa6dc2a1e4b909577d20850a138a56feb8cbbf2 on master.