Closed thisisaaronland closed 7 years ago
Go version:
bbox,cessation,country_id,deprecated,file_hash,fullname,geom_hash,geom_latitude,geom_longitude,id,inception,iso,iso_country,lastmodified,lbl_latitude,lbl_longitude,locality_id,name,parent_id,path,placetype,region_id,source,superseded_by,supersedes,wof_country
"-77.027931,38.956027,-77.027931,38.956027",uuuu,85633793,,af44eef000cacfd97df9503ee4a73cc2,,abce907b16f80b1b5db03b47890c420e,38.956027,-77.027931,353611525,uuuu,US,US,1491273460,0.000000,0.000000,85931779,Metro-American Tax Svc,85841475,353/611/525/353611525.geojson,venue,85688741,simplegeo,,,US
Python version:
bbox,cessation,country_id,deprecated,file_hash,fullname,geom_hash,geom_latitude,geom_longitude,id,inception,iso,iso_country,lastmodified,lbl_latitude,lbl_longitude,locality_id,name,parent_id,path,placetype,region_id,source,superseded_by,supersedes,wof_country
"-77.027931,38.956027,-77.027931,38.956027",uuuu,85633793,,af44eef000cacfd97df9503ee4a73cc2,,dbaf05a1fcde87a276a5fd6a816b174e,38.956027,-77.027931,353611525,uuuu,US,US,1491273460,0,0,85931779,Metro-American Tax Svc,85841475,353/611/525/353611525.geojson,venue,85688741,simplegeo,,,US
Basically the problem is that Go serializes the geom as {"coordinates":[-77.027931,38.956027],"type":"Point"}
and Python serializes it as {"type": "Point", "coordinates": [-77.027931, 38.956027]}
Because we can't have nice things...
From the Go docs:
Map values encode as JSON objects. The map's key type must either be a string, an integer type, or implement encoding.TextMarshaler. The map keys are sorted and used as JSON object keys by applying the following rules, subject to the UTF-8 coercion described for string values above:
The file itself exports: "geometry": {"coordinates":[-77.027931,38.956027],"type":"Point"}
Okay, so the problem is in the Python code. What we should have been doing was:
>>> g = json.dumps(geom, sort_keys=True, separators=(',', ':'))
>>> hash = hashlib.md5()
>>> hash.update(g)
>>> print hash.hexdigest()
abce907b16f80b1b5db03b47890c420e
It is tempting to monkey-patch the Go code to mimic the Python stuff but since it will always be 50/50 whether the keys are sorted and because it's easier to say "sort all the keys, no extra whitespace" we'll just live with the history and move forward...
Just for a good laugh... https://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00
>>> import mapzen.whosonfirst.utils
>>> f = mapzen.whosonfirst.utils.load_file("/usr/local/data/whosonfirst-data-venue-us-dc/data/353/611/525/353611525.geojson")
>>> mapzen.whosonfirst.utils.hash_geom(f)
'abce907b16f80b1b5db03b47890c420e'
null
is encoded as an empty stringwof-venue-latest.csv
vs.wof-venue-us-ca-latest.csv
For example: