vmolsa / psutil

Automatically exported from code.google.com/p/psutil
Other
0 stars 0 forks source link

UnicodeDecodeError on Danish Linux #476

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
proc.name or proc.cmdline containing non ascii character results in below error:

    arg_name = os.path.basename(proc.cmdline[1]) if proc.cmdline else None
  File "/usr/lib/python3.3/site-packages/psutil/__init__.py", line 402, in cmdline
    return self._platform_impl.get_process_cmdline()
  File "/usr/lib/python3.3/site-packages/psutil/_pslinux.py", line 463, in wrapper
    return fun(self, *args, **kwargs)
  File "/usr/lib/python3.3/site-packages/psutil/_pslinux.py", line 531, in get_process_cmdline
    return [x for x in f.read().split('\x00') if x]
  File "/usr/lib/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 92: 
ordinal not in range(128)

It seems wrong to do an 'ascii.decode', name and cmdline may contain non ascii 
characters

Original issue reported on code.google.com by l...@hupfeldtit.dk on 12 Feb 2014 at 1:00

GoogleCodeExporter commented 9 years ago
Mmmm, I fear this is going to be a nasty one.
Could you please paste the output of the following commands?

$ python -c "import sys; print(sys.getfilesystemencoding())"
$ echo $LC_ALL
$ echo $LANG

Also, it would be interesting to see how ps represents those commands, so can 
you paste the interesting part(s) of ps output as well.

Original comment by g.rodola on 12 Feb 2014 at 4:50

GoogleCodeExporter commented 9 years ago
I finally got around to reporducing the error.
I does not require a non-english linux setup, although it is unlikely to on an 
english login.
I did not have an LC_ALL env var, but I had an LC_CTYPE var and LANG.

To reproduce:
unset LC_CTYPE and LANG (they both need to ne unset)
run the attached æøåÅ.sh
run the attached psutil_test.py (while the above is running)

Original comment by l...@hupfeldtit.dk on 14 Feb 2014 at 11:24

Attachments:

GoogleCodeExporter commented 9 years ago
OK, I can reproduce the problem. If the correct encoding is set for the shell 
python 2.X returns a bytes string (because file is open in binary mode) while 
3.X will report the right cmdline (because text mode is the default):

giampaolo@UX32VD:~/svn/psutil$ python2.7 -c "import psutil; 
print(psutil.Process().cmdline())" æøåÅ.sh  
['python2.7', '-c', 'import psutil; print(psutil.Process().cmdline())', 
'\xc3\xa6\xc3\xb8\xc3\xa5\xc3\x85.sh']

giampaolo@UX32VD:~/svn/psutil$ python3.4 -c "import psutil; 
print(psutil.Process().cmdline())" æøåÅ.sh ['python3.4', '-c', 'import 
psutil; print(psutil.Process().cmdline())', 'æøåÅ.sh']

If the correct encoding is not set we'll get the same byte string on Python 2.x 
and UnicodeEncodeError on Python 3.x.
I'm not sure what's best to do here.
I think we should always open the file in text mode on both Python versions so 
that we return the right value.
On the other hand I'm not sure what's best to do in case of encoding errors.
Python provides different options for dealing with them:
http://docs.python.org/3.4/library/functions.html#open
We may choose to use errors='ignore' or errors='replace' although I don't like 
imposing such a decision on the users.

Note: other than cmdline() the problem also affects process name() and exe() 
methods.

I'll also have to make sure what happens on systems different than Linux.

Original comment by g.rodola on 15 Feb 2014 at 8:13

GoogleCodeExporter commented 9 years ago
FWIW "ps" replaces the invalid characters with "?" which reflects 
errors="replace" Python behavior. 

Original comment by g.rodola on 15 Feb 2014 at 8:22

GoogleCodeExporter commented 9 years ago
I think the problem is that if the user locale is not setup correctly, the file 
is not opened with UTF-8 encoding, even though the proc filesystem is (always?) 
UTF-8 encoded on newer Linuxes.

As shown below 'ps', does not work, but 'cat' does and if "encoding='UTF-8'" is 
specified in python, then python works as well. I don't think it is correct to 
depend on the user locale. What would the interpretation of a proc created by a 
user with a different locale be?

------
.. 15686]$ unset LC_CTYPE
.. 15686]$ unset LANG
.. 15686]$ ps auxww | grep 15686
xxx      15686  0.0  0.0 113116  1428 pts/7    S+   12:30   0:00 /bin/bash 
./????????.sh
.. 15686]$ cat cmdline
/bin/bash./æøåÅ.sh
.. 15686]$ python3
Python 3.3.2 (default, Nov  7 2013, 10:01:05) 
[GCC 4.8.1 20130814 (Red Hat 4.8.1-6)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> with open('cmdline') as ll:
...     print(ll.read())
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib64/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 12: 
ordinal not in range(128)
>>> with open('cmdline', encoding='UTF-8') as ll:
...     print(ll.read())
... 
/bin/bash./æøåÅ.sh

Original comment by l...@hupfeldtit.dk on 16 Feb 2014 at 12:08

GoogleCodeExporter commented 9 years ago
Thanks for sharing this info.
It seems sys.getdefaultencoding() always return 'utf8' no matter what the 
current locale is therefore that looks like the way to go on Python 3.
Fixed in revision 42c5b20d7f5b.

Original comment by g.rodola on 16 Feb 2014 at 1:37

GoogleCodeExporter commented 9 years ago
Thank you for providing psutil. I makes system management with python so much 
easier.

Original comment by l...@hupfeldtit.dk on 16 Feb 2014 at 1:45

GoogleCodeExporter commented 9 years ago
Glad to hear psutil is useful to you. 
Cheers. 

Original comment by g.rodola on 16 Feb 2014 at 4:41

GoogleCodeExporter commented 9 years ago

Original comment by g.rodola on 9 Mar 2014 at 10:26

GoogleCodeExporter commented 9 years ago
Closing out as fixed as 2.0.0 version is finally out.

Original comment by g.rodola on 10 Mar 2014 at 11:36