whosonfirst / go-whosonfirst-meta

Go package for working with Who's On First meta files
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Minimal viable Python replacement #1

Closed thisisaaronland closed 7 years ago

thisisaaronland commented 7 years ago

For example:

less /tmp/wof-venue-latest.csv

null,37.699383,uuuu,uuuu,�,Cardinal Jewelers,simplegeo,,US,�,0.000000,null,�,�,US,null,,US,-121.872286,0.000000,�,�
null,38.273667,uuuu,uuuu,�,Tafb Child Center 2,simplegeo,,US,�,0.000000,null,�,�,US,null,,US,-121.948160,0.000000,�,�
null,32.746029,uuuu,uuuu,�,Cable Technologies,simplegeo,,US,�,0.000000,null,�,�,US,null,,US,-117.011191,0.000000,�,�
thisisaaronland commented 7 years ago
thisisaaronland commented 7 years ago

Go version:

bbox,cessation,country_id,deprecated,file_hash,fullname,geom_hash,geom_latitude,geom_longitude,id,inception,iso,iso_country,lastmodified,lbl_latitude,lbl_longitude,locality_id,name,parent_id,path,placetype,region_id,source,superseded_by,supersedes,wof_country
"-77.027931,38.956027,-77.027931,38.956027",uuuu,85633793,,af44eef000cacfd97df9503ee4a73cc2,,abce907b16f80b1b5db03b47890c420e,38.956027,-77.027931,353611525,uuuu,US,US,1491273460,0.000000,0.000000,85931779,Metro-American Tax Svc,85841475,353/611/525/353611525.geojson,venue,85688741,simplegeo,,,US

Python version:

bbox,cessation,country_id,deprecated,file_hash,fullname,geom_hash,geom_latitude,geom_longitude,id,inception,iso,iso_country,lastmodified,lbl_latitude,lbl_longitude,locality_id,name,parent_id,path,placetype,region_id,source,superseded_by,supersedes,wof_country
"-77.027931,38.956027,-77.027931,38.956027",uuuu,85633793,,af44eef000cacfd97df9503ee4a73cc2,,dbaf05a1fcde87a276a5fd6a816b174e,38.956027,-77.027931,353611525,uuuu,US,US,1491273460,0,0,85931779,Metro-American Tax Svc,85841475,353/611/525/353611525.geojson,venue,85688741,simplegeo,,,US

https://github.com/whosonfirst/py-mapzen-whosonfirst-utils/blob/master/mapzen/whosonfirst/utils/__init__.py#L35-L42

thisisaaronland commented 7 years ago

Basically the problem is that Go serializes the geom as {"coordinates":[-77.027931,38.956027],"type":"Point"} and Python serializes it as {"type": "Point", "coordinates": [-77.027931, 38.956027]}

Because we can't have nice things...

thisisaaronland commented 7 years ago

From the Go docs:

Map values encode as JSON objects. The map's key type must either be a string, an integer type, or implement encoding.TextMarshaler. The map keys are sorted and used as JSON object keys by applying the following rules, subject to the UTF-8 coercion described for string values above:

https://golang.org/pkg/encoding/json/

thisisaaronland commented 7 years ago

The file itself exports: "geometry": {"coordinates":[-77.027931,38.956027],"type":"Point"}

thisisaaronland commented 7 years ago

Okay, so the problem is in the Python code. What we should have been doing was:

>>> g = json.dumps(geom, sort_keys=True, separators=(',', ':'))
>>> hash = hashlib.md5()
>>> hash.update(g)
>>> print hash.hexdigest()
abce907b16f80b1b5db03b47890c420e

It is tempting to monkey-patch the Go code to mimic the Python stuff but since it will always be 50/50 whether the keys are sorted and because it's easier to say "sort all the keys, no extra whitespace" we'll just live with the history and move forward...

thisisaaronland commented 7 years ago

Just for a good laugh... https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00

thisisaaronland commented 7 years ago

As of: https://github.com/whosonfirst/py-mapzen-whosonfirst-utils/commit/3c57eb5bc075fb5d4371c510d69fba56ef0e2ab3

>>> import mapzen.whosonfirst.utils
>>> f = mapzen.whosonfirst.utils.load_file("/usr/local/data/whosonfirst-data-venue-us-dc/data/353/611/525/353611525.geojson")
>>> mapzen.whosonfirst.utils.hash_geom(f)
'abce907b16f80b1b5db03b47890c420e'