pallets / click

Python composable command line interface toolkit
https://click.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
15.55k stars 1.39k forks source link

Encoding issue #448

Closed noxecane closed 8 years ago

noxecane commented 8 years ago

I was trying to use supervisor to start an application that uses click and I got this

 raise RuntimeError('Click will abort further execution '
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either switch to Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.
Traceback (most recent call last):
  File "/mnt/store/www/apps/inet/pyenv/bin/inetserver", line 11, in <module>
    sys.exit(main())
  File "/mnt/store/www/apps/inet/pyenv/lib/python3.4/site-packages/click/core.py", line 700, in __call__
    return self.main(*args, **kwargs)
  File "/mnt/store/www/apps/inet/pyenv/lib/python3.4/site-packages/click/core.py", line 655, in main
    raise RuntimeError('Click will abort further execution '
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either switch to Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.

I tried setting the locale using

Supervisor
environment = LC_ALL="en_US.utf-8", LANG="en_US.utf-8" 
Bash
$ export LC_ALL="en_US.utf-8"
$ export LANG="en_US.utf-8"

All to nothing. Note that I am using a virtualenv and python3.4

mitsuhiko commented 8 years ago

This is not a click problem but a python 3 issue. Either use Python 2.7 or reconfigure your environment to a locale supported on your machine with utf-8. There is nothing I can do in Click to fix this.

noxecane commented 8 years ago

The mitigation steps are not working(and I can't use python 2).Isn't there a way to enforce locale in python

mitsuhiko commented 8 years ago

You cannot fix this from within Python. This is a problem with the interpreter. The only way to fix this is to reconfigure the environment.

noxecane commented 8 years ago

No issue......

mitsuhiko commented 8 years ago

@danceasarxx I'm not sure what you mean.

noxecane commented 8 years ago

I've given up on using it....decided to use plac..and that's how we say no problem down here...

mitsuhiko commented 8 years ago

In case someone else stumbles upon this in the future: exporting the locale is the correct way to fix it as otherwise stdout will have the wrong encoding set (and the fs encoding on linux). If exporting the values does not fix it, then most likely the locale is missing. In this case the machine might not have an en_US locale installed in which case Python falls back to C which again is ASCII.

Deepwalker commented 8 years ago

Indeed this is true, but your suggestion Either switch to Python 2 is biased.

chhantyal commented 8 years ago

I guess this is documented here http://click.pocoo.org/5/python3/#python-3-surrogate-handling?

hynek commented 8 years ago

@danceasarxx the name of the locale is en_US.utf8 (without a hyphen). Click is protecting you from running a faulty setup here.

You can double check which locales are available using locale -a.

mitsuhiko commented 8 years ago

@hynek on which system is utf-8 not a valid charset name in locales? I cannot reproduce that on any Linux build. It's more likely the system does not have utf-8 locales configured at all in that case.

wbolster commented 8 years ago

fyi, the reported locale names on a linux machine contain .utf8:

$ locale -a | grep -i en.*utf
en_GB.utf8
en_US.utf8

and on mac osx they contain .UTF-8:

$ locale -a | grep -i en.*utf
en_AU.UTF-8
en_CA.UTF-8
en_GB.UTF-8
en_IE.UTF-8
en_NZ.UTF-8
en_US.UTF-8
wbolster commented 8 years ago

Though linux machines accept utf-8 just fine:

$ date
Thu  5 Nov 10:48:09 UTC 2015
$ LANG=nl_NL.utf8 date
do nov  5 10:48:12 UTC 2015
$ LANG=nl_NL.UTF-8 date
do nov  5 10:48:18 UTC 2015
mitsuhiko commented 8 years ago

The error message is dynamic now and should help debugging this issue: https://github.com/mitsuhiko/click/blob/40705b9d69e78e599c26b8e55c828ae19bd5ed0c/click/_unicodefun.py#L42-L108

noxecane commented 8 years ago

@hynek Changing it to en_US.utf8 did work. Thanks a lot

mitsuhiko commented 8 years ago

@danceasarxx on which server operating system?

noxecane commented 8 years ago

centos 7. It's actually on a docker.

mitsuhiko commented 8 years ago

No idea why UTF8 would fix anything but if you are using docker you need to configure locales. By default docker boots up in ASCII mode and does not have any locales unless you run some custom image.

noxecane commented 8 years ago

On a base centos image

[root@7d3c22fe87eb inet]# locale -a
C
POSIX
en_US.utf8
alanfranz commented 8 years ago

Hello @mitsuhiko , I need one more clarification on the topic;

I think I understand quite a lot of things on encoding and Unicode issues (I even have a couple of blog posts on how it works in Python 2.x vs Java on my blog, if you need to check) - this is not to say that I know everything or that I'm the Most Competent Person In The World on the topic, but that I'm not an absolute beginner.

I have read the section on surrogate handling as well as the open bugs linked in that section.

Everywhere you keep saying "you need to configure locales" and you treat ASCII as if it were a disease.

AFAICU the issue is that, since strings object are actually unicode objects in Python 3.x, there's a struggle because if there's some non-ASCII char on the command line (or any string to be printed contains non-ASCII chars) then, effectively, you don't know what to do and how to handle the situation. This seems complicated even further by Python issue 8776 (sys.argv decoding).

By the way I would understand the error 100% if that happened when sending non-ASCII chars to the command line, or when trying to print non-ASCII chars, or when detecting non-ASCII chars anywhere. THAT SITUATION would ABSOLUTELY require to raise an exception.

At RUNTIME.

On the other hand.. what if I'm 100% safe that everything is ASCII? Because I only used ascii in my software, and it makes zero sense for users to employ other charsets?

Is that a "preventive war"? I refuse to work because for some arguments I might not work?

PS Of course I took the time to "hotpatch" click with

from click import core
core._verify_python3_env = lambda: None

And I found my app works as I expect, that's why I'm allowing myself to raise some doubts on this implementation.

As you note in the documentation, the issue is especially nasty in crontabs/init scripts/etc, places where you have ZERO user input (so it's quite controlled and you know there's nothing else than ASCII) and your output is most probably to a file where you log (and, there, you can choose the encoding you like). So, I cannot really find a purpose for such check.

mrocklin commented 6 years ago

Checking in here. It looks like some of the conversation here is a bit contentious, so I apologize for bringing this up again.

First, let me say, thank you so much @mitsuhiko for maintaining this library. I, and many others that I know, really appreciate your hard and unpaid work here.

I write and maintain a library that gets used on a lot of old supercomputing systems, many of which don't have locales set, and so my users run into this problem frequently. We've handled this situation so far with documentation and informative error messages, but still they persist in having issues. The problem is that many of our users aren't sufficiently sophisticated to address this issue on their own, and so it becomes a major pain point that I'm not sure how to address.

While I disagree with @alanfranz 's tone, I'm curious about his solution, and if there is a hack that I can do to opt-out of this verification. This is a compromise that I might be willing to make, and that I think would give my users a better experience.

However, I'm somewhat concerned about the reliance on the private function, and whether or not this function might move disappear in the future. So, question for @mitsuhiko , is it reasonable for click to offer a long-term-stable opt-in mechanism for downstream library authors to disable this check?

alanfranz commented 6 years ago

@mitsuhiko @mrocklin Re-reading myself after a couple of years, I agree that my tone sounds confrontational, and I apologize about that. I don't remember being angry at the time, I was probably just trying to be a little too assertive.

My points hold, btw. I don't think there's an actual need for the pre-run checks that are being performed by click. It could just crash at runtime if something is wrong. I'd be happy to contribute a patch, if that will be looked at (i.e. if that won't be dismissed as 'it's a python3 problem just install locales').

exhuma commented 4 years ago

After several unsuccessful attempts fiddling about with the LC_* variables I dug around Google again. And via an answer on StackOverflow I came across PYTHONIOENCODING. And indeed, setting this to UTF-8 fixed the issue for my case.

In my case, I am running inside a docker container in a GitLab pipeline. And setting LC_ALL and LANG to C.UTF-8 did not help at all even though the locale is available:

$ locale -a
C
C.UTF-8
POSIX

Setting PYTHONIOENCODING finally fixed the issue for me.

danihodovic commented 3 years ago

No idea why UTF8 would fix anything but if you are using docker you need to configure locales. By default docker boots up in ASCII mode and does not have any locales unless you run some custom image.

I would guess the most containers don't have locales configured, especially ones that try to keep size minimal. This makes click unusable in containerized environments.

https://github.com/pallets/click/issues/448#issuecomment-246029304

Thanks for the workaround @alanfranz