nlextract / NLExtract

Convert (ETL) and visualize free Dutch geo-datasets.
https://nlextract.nl
GNU General Public License v3.0
149 stars 84 forks source link

Docker process fails on gemeentelijke indeling generation: openpyxl package missing #338

Closed justb4 closed 2 years ago

justb4 commented 2 years ago

Running BAGV2 ETL with latest Docker Image.

2021-12-22 16:04:53,505 chain INFO Assembling Chain: input_string_file|command_executor...
2021-12-22 16:04:53,506 input INFO cfg = {'class': 'stetl.inputs.fileinput.StringFileInput', 'file_path': 'data/cbs/gemeentelijke-indeling_command.txt'}
2021-12-22 16:04:53,507 fileinput INFO file_list=['data/cbs/gemeentelijke-indeling_command.txt']
2021-12-22 16:04:53,511 chain INFO Running Chain: input_string_file|command_executor
2021-12-22 16:04:53,511 fileinput INFO Read/parse for start for file=data/cbs/gemeentelijke-indeling_command.txt....
2021-12-22 16:04:53,511 fileinput INFO Read/parse ok for file=data/cbs/gemeentelijke-indeling_command.txt
2021-12-22 16:04:53,511 fileinput INFO all files done
2021-12-22 16:04:53,512 execfilter INFO executing cmd=../../bag/bin/gemeentelijke-indeling.sh --convert-to-csv -i ../../bag/db/data/gemeentelijke-indeling.xml -o data/cbs/\
gemeentelijke-indeling.csv

Traceback (most recent call last):
  File "/nlx/bag/src/gemeentelijke-indeling.py", line 47, in <module>
    import openpyxl
ModuleNotFoundError: No module named 'openpyxl'

Think we need to add openpyxl dependency to both requirements.txt and python3-openpyxl to packages used in the Dockerfile. Will try that first.

justb4 commented 2 years ago

Ok, now the next error:

2021-12-22 16:25:56,117 execfilter INFO executing cmd=../../bag/bin/gemeentelijke-indeling.sh --convert-to-csv -i ../../bag/db/data/gemeentelijke-indeling.xml -o data/cbs/\
gemeentelijke-indeling.csv

Traceback (most recent call last):
  File "/nlx/bag/src/gemeentelijke-indeling.py", line 48, in <module>
    import xlrd
ModuleNotFoundError: No module named 'xlrd'

Fix may be similar: add xlrd==2.0.1 to requirements and python3-xlrd for Dockerfile.

sebastic commented 2 years ago

xlrd is required for gemeentelijke-indeling.py since the switch to Python 3 (#276).

openpyxl is required for gemeentelijke-indeling.py since CBS started using the XLSX format for 2021. (#294)

justb4 commented 2 years ago

Ok, but this was not reflected in the dependencies for both Python and Docker. Now it is, and working, so closing.