open-contracting / ocdskit

A suite of command-line tools for working with OCDS data
https://ocdskit.readthedocs.io
BSD 3-Clause "New" or "Revised" License
17 stars 7 forks source link

package-releases: Crash on Windows #147

Closed duncandewhurst closed 4 years ago

duncandewhurst commented 4 years ago

Noted whilst testing the OCDS Kit Learning Lab on Windows 10.

The package-releases command crashes with the following Traceback:

Traceback (most recent call last):
  File "C:\Users\Duncan\AppData\Local\Programs\Python\Python38-32\Scripts\ocdskit-script.py", line 11, in <module>
    load_entry_point('ocdskit==0.2.9', 'console_scripts', 'ocdskit')()
  File "c:\users\duncan\appdata\local\programs\python\python38-32\lib\site-packages\ocdskit\cli\__main__.py", line 61, in main
    command.handle()
  File "c:\users\duncan\appdata\local\programs\python\python38-32\lib\site-packages\ocdskit\cli\commands\package_releases.py", line 26, in handle
    self.print(output, streaming=True)
  File "c:\users\duncan\appdata\local\programs\python\python38-32\lib\site-packages\ocdskit\cli\commands\base.py", line 84, in print
    print(chunk, end='')
  File "c:\users\duncan\appdata\local\programs\python\python38-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-4: character maps to <undefined>

I tried specifying the encoding of the source files (utf-8) but got the same error.

For reference I used the following commands:

type afghanistan_*.json | ocdskit package-releases > afghanistan_release_package.json

and

type afghanistan_*.json | ocdskit --encoding utf-8 package-releases > afghanistan_release_package.json

with two files downloaded from the Afghanistan OCDS API: file 1, file 2

jpmckinney commented 4 years ago

Did you do set PYTHONIOENCODING=utf-8 as instructed here? https://ocdskit.readthedocs.io/en/latest/cli.html

I see cp1252 in the backtrace, which is a Windows encoding which breaks lots of Python CLI tools.

The problem is basically that OCDS Kit is providing UTF-8 data, but Python knows that the shell's encoding is cp1252, so it tries to encoding the UTF-8 data as such, and it errors.

duncandewhurst commented 4 years ago

Ah, I missed that. Problem solved.