okfn / ibp-explorer

[ARCHIVED] Data Explorer for the Open Budget Survey, built in collaboration with the International Budget Partnership.
http://survey.internationalbudget.org
7 stars 5 forks source link

2019 data update issues #124

Closed amercader closed 4 years ago

amercader commented 4 years ago
Reading OBS2019_QuestionsNumbers+Text.xlsx...
Reading GroupingsOBSQuestions2019.xlsx...
Reading OBI 2019.xlsx...
Reading Public Availability 2019.xlsx...
Reading public participation.xlsx...
Traceback (most recent call last):
  File "etl.py", line 169, in <module>
    run_etl(DEFAULT_OUTPUT, DEFAULT_DOWNLOADFOLDER)
  File "etl.py", line 150, in run_etl
    dataset_2019 = lib_read.read(iso_data, datafiles, '2019')
  File "/home/adria/dev/pyenvs/ibp/src/ibp-explorer/data/lib_read.py", line 48, in read
    _read_groupings(g_workbook, datafiles['g_xlsx_qsheet'])
  File "/home/adria/dev/pyenvs/ibp/src/ibp-explorer/data/lib_read.py", line 180, in _read_groupings
    'qs': _parse_int_list(_lookup(sheet, 3, y)),
  File "/home/adria/dev/pyenvs/ibp/src/ibp-explorer/data/lib_read.py", line 311, in _parse_int_list
    for s in int_list.replace(' ', '').split(','):
AttributeError: 'NoneType' object has no attribute 'replace'

On the GroupingsOBSQuestions2019.xlsx file, QuestionsGroups 2019 and 2017 sheet, there are some extra rows starting with "Note: We would also like to show the following indicators:". Move them to somewhere else

--

python etl.py
Reading OBS2019_QuestionsNumbers+Text.xlsx...
Reading GroupingsOBSQuestions2019.xlsx...
Reading OBI 2019.xlsx...
Reading Public Availability 2019.xlsx...
/home/adria/dev/pyenvs/ibp/local/lib/python2.7/site-packages/openpyxl/reader/worksheet.py:322: UserWarning: Unknown extension is not supported and will be removed
  warn(msg)
Reading public participation.xlsx...
Traceback (most recent call last):
  File "etl.py", line 169, in <module>
    run_etl(DEFAULT_OUTPUT, DEFAULT_DOWNLOADFOLDER)
  File "etl.py", line 150, in run_etl
    dataset_2019 = lib_read.read(iso_data, datafiles, '2019')
  File "/home/adria/dev/pyenvs/ibp/src/ibp-explorer/data/lib_read.py", line 53, in read
    datafiles['av_xlsx_sheets'])
  File "/home/adria/dev/pyenvs/ibp/src/ibp-explorer/data/lib_read.py", line 243, in _read_availability
    '[%s] I have no ISO-3116 mapping for country name "%s". Please add one to the ISO mappings file.' % (name, name)
AssertionError: [='raw 2019'!A2] I have no ISO-3116 mapping for country name "='raw 2019'!A2". Please add one to the ISO mappings file.

On the Public Availability 2019.xlsx file, 2019 sheet, the values appear to be using a formula, which makes the script choke.

amercader commented 4 years ago

@chris48s I think this is as far as I got with the ETL script ^