ome / omero-py

Python project containing Ice remoting code for OMERO
https://www.openmicroscopy.org/omero
GNU General Public License v2.0
20 stars 33 forks source link

Error handling when attempting to check log files #236

Closed chris-allan closed 4 years ago

chris-allan commented 4 years ago

I think this is a Python2 --> Python3 hangover. Python2 would have just silently corrupted incorrect encodings whereas Python3 will unicode everything and try to use what's specified. UnicodeDecodeError's can then be thrown during omero admin diagnostics if log files contain UTF-8. For example:

Log dir:    /opt/omero/OMERO.current/var/log exists
Log files:  Blitz-0.log                    339.8 MB      errors=273  warnings=1024
Log files:  Blitz-0.log.1                  Traceback (most recent call last):
  File "/opt/omero/OMERO.venv36/bin/omero", line 118, in <module>
    rv = omero.cli.argv()
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1754, in argv
    cli.invoke(args[1:])
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1187, in invoke
    stop = self.onecmd(line, previous_args)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1264, in onecmd
    self.execute(line, previous_args)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1346, in execute
    args.func(args)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/install/windows_warning.py", line 26, in wrapper
    return func(self, *args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/plugins/prefs.py", line 79, in open_and_close_config
    return func(*args, **kwargs)
  File "/opt/omero/OMERO.venv36/lib/python3.6/site-packages/omero/plugins/admin.py", line 1417, in diagnostics
    parse_logs()
  File "/opt/omero/OMERO.venv36/lib/python3.6/site-packages/omero/plugins/admin.py", line 1371, in parse_logs
    self._exists(old_div(log_dir, x))
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero/cli.py", line 1117, in _exists
    for l in p.lines():
  File "/opt/omero/OMERO.venv36/lib64/python3.6/site-packages/omero_ext/path.py", line 938, in lines
    return f.readlines()
  File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1809: ordinal not in range(128)

Default encoding of our vendored path.py is specified as ascii here:

Since we are using PyPI for nearly everything now might be worth looking into removing our vendored version and adding a dependency on the path module.

/cc @emilroz

manics commented 4 years ago

Is it possible for non-utf8 characters to end up in master.err or master.out? If so how does this behave?

sbesson commented 4 years ago

At least in the context of the diagnostic command, I think the parsing of master.err and master.out and generally all files under the log directory is covered by the same logic as the one fixed in this PR

https://github.com/ome/omero-py/blob/f44d09e7c39d068a71a191ac979801dfed976705/src/omero/plugins/admin.py#L1374-L1383

chris-allan commented 4 years ago

@sbesson: In our case LANG=en_GB.UTF-8. I had lots of issues reproducing consistently with a მიკროსკოპის.fake. In fact on the system in question there are also import issues with such a file. Let's find some time to try and debug together over the next few days.

chris-allan commented 4 years ago

To summarize today's sleuthing, the consensus was:

  1. Where possible the locale and default encodings of Python should be respected (on Linux this is via LANG and LC_ALL)
  2. Go with something similar to what @joshmoore has started in #224 and ignore decoding errors. After all, we are just trying to count errors and warnings here.

There seems to be a separate issue with Java 11 handling unicode somewhere resulting in ????????.fake files ending up in the managed repository when filenames are UTF-8. I will try to track this down separately. For completeness, some examples on how to check the encoding of various dependencies of our stack follow:

$ jshell
|  Welcome to JShell -- Version 11.0.7
|  For an introduction type: /help intro
jshell> import java.nio.charset.Charset;
jshell> import java.util.Locale;
jshell> Locale.getDefault();
$3 ==> en_GB
jshell> Charset.defaultCharset();
$4 ==> UTF-8
jshell> System.getProperty("file.encoding");
$5 ==> "UTF-8"
jshell> System.getProperty("sun.jnu.encoding");
$6 ==> "UTF-8"
$ ipython
Python 3.6.8 (default, Apr  2 2020, 13:34:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import sys
In [2]: import locale
In [3]: locale.getpreferredencoding()
Out[3]: 'UTF-8'
In [4]: sys.getdefaultencoding()
Out[4]: 'utf-8'
$ psql omero
psql (9.6.18)
Type "help" for help.
omero=> SHOW SERVER_ENCODING;
 server_encoding
-----------------
 UTF8
(1 row)
omero=> SHOW CLIENT_ENCODING;
 client_encoding
-----------------
 UTF8
(1 row)
chris-allan commented 4 years ago

References: