opencultureconsulting / openrefine-client

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
GNU General Public License v3.0
83 stars 19 forks source link

unicode issues in --list and --info when piping stdout #15

Closed felixlohmeier closed 3 years ago

felixlohmeier commented 3 years ago

this bug does not affect one-file-executables

example

[felix@tux openrefine-client]$ python2 refine.py --download "https://git.io/fj5hF" --output=duplicates.csv
Download to file duplicates.csv complete
[felix@tux openrefine-client]$ python2 refine.py --create duplicates.csv --projectName "biểu tượng cảm xúc ⛲"
id: 1915256050198
rows: 10
[felix@tux openrefine-client]$ python2 refine.py --list
 1915256050198: biểu tượng cảm xúc ⛲
[felix@tux openrefine-client]$ python2 refine.py --list > test
Traceback (most recent call last):
  File "refine.py", line 35, in <module>
    __main__.main()
  File "/home/felix/git/openrefine-client/google/refine/__main__.py", line 237, in main
    cli.ls()
  File "/home/felix/git/openrefine-client/google/refine/cli.py", line 219, in ls
    print(u'{0:>14}: {1}'.format(project_id, project_info['name']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1ec3' in position 18: ordinal not in range(128)

solution from http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python

Whenever you read data from outside your app, expect it to be bytes - eg, of type str - and call .decode() on it to interpret it as text. Likewise, always call .encode() on text you want to send to the outside world.