openaddresses / machine

Scripts for running OpenAddresses on a complete data set and publishing the results.
http://results.openaddresses.io/
ISC License
97 stars 36 forks source link

Added support for geojson files using "application/vnd.geo+json" as the mime type #763

Closed MiniCodeMonkey closed 4 years ago

MiniCodeMonkey commented 4 years ago

Currently, it is not possible to process sources using the application/vnd.geo+json mime type, as a somewhat-random file extension is assigned instead of .json.

This change adds this additional mime type to the list of known mime types in the mimetypes python library.

Here's an example of a source that fails because of the incorrect file extension.

     1       316.1  DEBUG: Requesting https://planning.kimberley.ca/wms/layers?service=WFS&version=1.3.0&request=GetFeature&typename=Addresses&outputFormat=application%2Fvnd.geo%2Bjson with args None     1       673.6  DEBUG: Content-Type says "application/vnd.geo+json" for https://planning.kimberley.ca/wms/layers?service=WFS&version=1.3.0&request=GetFeature&typename=Addresses&outputFormat=application%2Fvnd.geo%2Bjson     1       684.1  DEBUG: file says "text/plain" for https://planning.kimberley.ca/wms/layers?service=WFS&version=1.3.0&request=GetFeature&typename=Addresses&outputFormat=application%2Fvnd.geo%2Bjson     1       685.1  DEBUG: Guessed addresses-primary-9671f806.ksh for https://planning.kimberley.ca/wms/layers?service=WFS&version=1.3.0&request=GetFeature&typename=Addresses&outputFormat=application%2Fvnd.geo%2Bjson     1       687.9  DEBUG: Requesting https://planning.kimberley.ca/wms/layers?service=WFS&version=1.3.0&request=GetFeature&typename=Addresses&outputFormat=application%2Fvnd.geo%2Bjson with args None     1      1869.2   INFO: Downloaded 3625535 bytes for file vol/process_one-h9fjzu9x/cache-_htujzwk/http/addresses-primary-9671f806.ksh     1      1905.6   INFO: Cached data in file:///vol/process_one-h9fjzu9x/cached/addresses-primary-9671f806.ksh     1      2042.1  DEBUG: URL says ".ksh" for file:///vol/process_one-h9fjzu9x/cached/addresses-primary-9671f806.ksh     1      2042.8  DEBUG: Guessed addresses-primary-50d27459.ksh for file:///vol/process_one-h9fjzu9x/cached/addresses-primary-9671f806.ksh     1      2160.8  DEBUG: File exists vol/process_one-h9fjzu9x/conform-6wuikgg3/http/addresses-primary-50d27459.ksh     1      2161.7   INFO: Downloaded to ['vol/process_one-h9fjzu9x/conform-6wuikgg3/http/addresses-primary-50d27459.ksh']
     1      2162.8 WARNING: Could not guess a single compression from file names
     1      2163.4   INFO: Decompressed to 1 files
     1      2164.2 WARNING: Error doing excerpt; skipping
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/openaddr/__init__.py", line 162, in conform
    _L.info("Sampled %d records", len(data_sample))
TypeError: object of type 'NoneType' has no len()
     1      2165.1  DEBUG: Converting to vol/process_one-h9fjzu9x/conform-6wuikgg3
     1      2167.3 WARNING: No JSON found in ['vol/process_one-h9fjzu9x/conform-6wuikgg3/http/addresses-primary-50d27459.ksh']
     1      2167.8 WARNING: Found no addresses in source data
     1      2183.3 WARNING: Nothing processed
     1      2330.1   INFO: Wrote to state: vol/city_of_kimberley/addresses/primary/index.json
vol/city_of_kimberley/addresses/primary/index.json
     1      2343.8   INFO: Resource usage: { user: 1%, system: 1%, memory: 557MB, read: 14188KB, written: 8KB, sent: 54KB, received: 3690KB, period: 2sec, procs: 1 }
     1      2344.1   INFO: process shutting down
     1      2344.2  DEBUG: running all "atexit" finalizers with priority >= 0
     1      2344.2  DEBUG: running the remaining "atexit" finalizers
MiniCodeMonkey commented 4 years ago

Source depending on this change is here: https://github.com/openaddresses/openaddresses/pull/4798

MiniCodeMonkey commented 4 years ago

@iandees Is this cool to be merged in?

iandees commented 4 years ago

Yep, thanks!

migurski commented 4 years ago

It looks good to me! I am a bit confused about why the tests didn't run, but it’s a small-enough change that I’m not worried about them passing.

MiniCodeMonkey commented 4 years ago

Thanks!