nismod / ukpopulation

Population and demographics projection module, developed for ITRC/MISTRAL
MIT License
13 stars 7 forks source link

NPP download bug (python3.5 only) #18

Closed virgesmith closed 6 years ago

virgesmith commented 6 years ago
$ python3 doc/example_variant.py
Cache directory:  /home/geoaps/.ukpopulation/cache
using cached LAD codes: /home/geoaps/.ukpopulation/cache/lad_codes.json
Loading NPP principal (ppp) data for England, Wales, Scotland & Northern Ireland
/home/geoaps/.ukpopulation/cache/NM_2009_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2009_1_0bcd330bc936cd7902566cf7198d8868.tsv
Cache directory:  /home/geoaps/.ukpopulation/cache
using cached LAD codes: /home/geoaps/.ukpopulation/cache/lad_codes.json
Collating SNPP data for England...
/home/geoaps/.ukpopulation/cache/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2006_1_1412780ddd715d804371850734000928.tsv
/home/geoaps/.ukpopulation/cache/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2006_1_a5b81a739b05970852420fdf22dd43c9.tsv
Collating SNPP data for Wales...
Collating SNPP data for Scotland...
Collating SNPP data for Northern Ireland...
using /home/geoaps/.ukpopulation/cache/npp_ni.zip
using /home/geoaps/.ukpopulation/cache/npp_wa.zip
using /home/geoaps/.ukpopulation/cache/npp_sc.zip
using /home/geoaps/.ukpopulation/cache/npp_en.zip
Extracting ni_hhh
Traceback (most recent call last):
  File "doc/example_variant.py", line 20, in <module>
    hhh = snpp.create_variant("hhh", npp, lad, range(start_year, end_year + 1))
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/snppdata.py", line 131, in create_variant
    scaling = npp.variant_ratio(variant_name, utils.country(geog_code), year_range).reset_index().sort_values(["C_AGE", "GENDER", "PROJECTED_YEAR_NAME"])
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 142, in variant_ratio
    num = self.detail(variant_numerator, geog, years, ages, genders).set_index(["C_AGE", "GENDER", "PROJECTED_YEAR_NAME"])
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 100, in detail
    self.__load_variant(variant_name)
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 222, in __load_variant
    vdata = np.array(_read_excel_xml(self.cache_dir + "/" + vxml, "Population"))
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 18, in _read_excel_xml
    if sheet["ss:Name"] == sheet_name:
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
    return self.attrs[key]
KeyError: 'ss:Name'
virgesmith commented 6 years ago

Still happening (empty cache dir)

virgesmith commented 6 years ago

Appears to be intermittent, failing more often than not, but not every time. So may affect other python versions. Can only assume down to UB deep within bs4.

virgesmith commented 6 years ago

Probably related to this xml namespace issue: https://stackoverflow.com/questions/37195992/beautifulsoup4-removes-namespace-definitions-from-schema-in-wsdl