Closed chris-allan closed 4 years ago
Is it possible for non-utf8 characters to end up in master.err
or master.out
? If so how does this behave?
At least in the context of the diagnostic
command, I think the parsing of master.err
and master.out
and generally all files under the log
directory is covered by the same logic as the one fixed in this PR
@sbesson: In our case LANG=en_GB.UTF-8
. I had lots of issues reproducing consistently with a მიკროსკოპის.fake
. In fact on the system in question there are also import issues with such a file. Let's find some time to try and debug together over the next few days.
To summarize today's sleuthing, the consensus was:
LANG
and LC_ALL
)There seems to be a separate issue with Java 11 handling unicode somewhere resulting in ????????.fake
files ending up in the managed repository when filenames are UTF-8. I will try to track this down separately. For completeness, some examples on how to check the encoding of various dependencies of our stack follow:
$ jshell
| Welcome to JShell -- Version 11.0.7
| For an introduction type: /help intro
jshell> import java.nio.charset.Charset;
jshell> import java.util.Locale;
jshell> Locale.getDefault();
$3 ==> en_GB
jshell> Charset.defaultCharset();
$4 ==> UTF-8
jshell> System.getProperty("file.encoding");
$5 ==> "UTF-8"
jshell> System.getProperty("sun.jnu.encoding");
$6 ==> "UTF-8"
$ ipython
Python 3.6.8 (default, Apr 2 2020, 13:34:55)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import sys
In [2]: import locale
In [3]: locale.getpreferredencoding()
Out[3]: 'UTF-8'
In [4]: sys.getdefaultencoding()
Out[4]: 'utf-8'
$ psql omero
psql (9.6.18)
Type "help" for help.
omero=> SHOW SERVER_ENCODING;
server_encoding
-----------------
UTF8
(1 row)
omero=> SHOW CLIENT_ENCODING;
client_encoding
-----------------
UTF8
(1 row)
I think this is a Python2 --> Python3 hangover. Python2 would have just silently corrupted incorrect encodings whereas Python3 will unicode everything and try to use what's specified.
UnicodeDecodeError
's can then be thrown duringomero admin diagnostics
if log files contain UTF-8. For example:Default encoding of our vendored
path.py
is specified as ascii here:Since we are using PyPI for nearly everything now might be worth looking into removing our vendored version and adding a dependency on the
path
module./cc @emilroz