sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.33k stars 453 forks source link

Python 3.7+: setlocale: LC_ALL: cannot change locale (C.UTF-8) from build/bin/sage-spkg and in doctests; disable use of system Python 3.6 #30053

Closed dimpase closed 4 years ago

dimpase commented 4 years ago

In #29033 in build/bin/sage-spkg LC_ALL was changed to C.UTF-8 However, not all systems have it.

See also:

Follow-up:

CC: @antonio-rojas @mkoeppe @slel @orlitzky @kiwifb @embray @fchapoton

Component: build

Author: Dima Pasechnik, Matthias Koeppe

Branch: be47518

Reviewer: Matthias Koeppe, Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/30053

dimpase commented 4 years ago
comment:3

an easy way out is just to check whether the locale change worked, and if not, use C locale, not C.UTF-8. Perhaps print a warning.

mkoeppe commented 4 years ago
comment:5

Replying to @dimpase:

an easy way out is just to check whether the locale change worked, and if not, use C locale, not C.UTF-8.

+1

dimpase commented 4 years ago

Author: Dima Pasechnik

dimpase commented 4 years ago

Commit: 37e042c

dimpase commented 4 years ago

New commits:

37e042conly use locale C.UTF-8 if available, else C
dimpase commented 4 years ago

Branch: u/dimpase/build/careful_with_C_UTF8

kliem commented 4 years ago
comment:7

Either this ticket is a duplicate of #30008 or it should make #30008 obsolete.

kliem commented 4 years ago
comment:8

I started test runs:

https://github.com/kliem/sage/pull/20/checks

dimpase commented 4 years ago
comment:9

this ticket was a result of a bug report on Arch, not centos. Hopefully it works for #30008 too.

kliem commented 4 years ago
comment:10

Tests did not complete, because the 9.2.beta3 tests fail everywhere.

https://github.com/sagemath/sage/actions/runs/157607524

Is this a github issue or have we broken sage?

 Step 13/18 : RUN ./bootstrap
 ---> Running in 89427ef5c1c4
rm -rf config configure build/make/Makefile-auto.in
rm -f src/doc/en/installation/*.txt
rm -rf src/doc/en/reference/spkg/*.rst
rm -f src/doc/en/reference/repl/*.txt
src/doc/bootstrap:48: installing src/doc/en/installation/arch.txt and src/doc/en/installation/arch-optional.txt
src/doc/bootstrap:48: installing src/doc/en/installation/debian.txt and src/doc/en/installation/debian-optional.txt
src/doc/bootstrap:48: installing src/doc/en/installation/fedora.txt and src/doc/en/installation/fedora-optional.txt
src/doc/bootstrap:48: installing src/doc/en/installation/cygwin.txt and src/doc/en/installation/cygwin-optional.txt
src/doc/bootstrap:48: installing src/doc/en/installation/homebrew.txt and src/doc/en/installation/homebrew-optional.txt
src/doc/bootstrap:55: installing src/doc/en/reference/spkg/*.rst
src/doc/bootstrap:83: installing src/doc/en/reference/repl/options.txt
src/doc/bootstrap: line 84: src/doc/en/reference/repl/options.txt: No such file or directory
The command '/bin/sh -c ./bootstrap' returned a non-zero code: 1
kliem commented 4 years ago
comment:11

I just found #30064. Edit: I was cc on that, but I didn't realize how serious this is.

Ok. I'll run a new test then.

kliem commented 4 years ago
comment:12

This breaks building sphinx on windows.

https://github.com/kliem/sage/runs/838933940

Same error as #30008.

As far as I understand the problem is that we need some sort of UTF to make the sphinx build work. It appears that on cygwin the default is better than C and C.UTF-8 does not work. So maybe C is not the best alternative for C.UTF-8.

Btw, strangely centos 7 appears to work with the current beta. I don't know what happened. (And I don't know yet, if this behavior is stable).

kliem commented 4 years ago
comment:13

And it breaks centos 8.

embray commented 4 years ago
comment:14

Replying to @kliem:

This breaks building sphinx on windows.

https://github.com/kliem/sage/runs/838933940

Same error as #30008.

As far as I understand the problem is that we need some sort of UTF to make the sphinx build work. It appears that on cygwin the default is better than C and C.UTF-8 does not work. So maybe C is not the best alternative for C.UTF-8.

I'm not sure what you mean here. C.UTF-8 is supported on Cygwin and is in fact the default locale in absence of any other settings: https://www.cygwin.com/cygwin-ug-net/setup-locale.html

The default locale in the absence of the aforementioned locale environment variables is "C.UTF-8".

dimpase commented 4 years ago
comment:15

the error UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 45: ordinal not in range(128):

2020-07-05T16:29:58.1676169Z [sphinx-3.0.4.p0] installing. Log file: /cygdrive/d/a/sage/sage/logs/pkgs/sphinx-3.0.4.p0.log
2020-07-05T16:30:01.7883565Z   [sphinx-3.0.4.p0] error installing, exit status 1. End of log file:
2020-07-05T16:30:01.8141086Z   [sphinx-3.0.4.p0]   Found local metadata for sphinx-3.0.4.p0
2020-07-05T16:30:01.8155620Z   [sphinx-3.0.4.p0]   Attempting to download package Sphinx-3.0.4.tar.gz from mirrors
2020-07-05T16:30:01.8169727Z   [sphinx-3.0.4.p0]   http://mirrors.mit.edu/sage/spkg/upstream/sphinx/Sphinx-3.0.4.tar.gz
2020-07-05T16:30:01.8174798Z   [sphinx-3.0.4.p0]   [......................................................................]
2020-07-05T16:30:01.8191111Z   [sphinx-3.0.4.p0]   sphinx-3.0.4.p0
2020-07-05T16:30:01.8193080Z   [sphinx-3.0.4.p0]   ====================================================
2020-07-05T16:30:01.8197343Z   [sphinx-3.0.4.p0]   Setting up build directory for sphinx-3.0.4.p0
2020-07-05T16:30:01.8217424Z   [sphinx-3.0.4.p0]   Traceback (most recent call last):
2020-07-05T16:30:01.8221790Z   [sphinx-3.0.4.p0]     File "/cygdrive/d/a/sage/sage/build/bin/sage-uncompress-spkg", line 23, in <module>
2020-07-05T16:30:01.8222267Z   [sphinx-3.0.4.p0]       run()
2020-07-05T16:30:01.8222576Z   [sphinx-3.0.4.p0]     File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/cmdline.py", line 72, in run
2020-07-05T16:30:01.8222857Z   [sphinx-3.0.4.p0]       unpack_archive(archive, dirname)
2020-07-05T16:30:01.8223251Z   [sphinx-3.0.4.p0]     File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/action.py", line 68, in unpack_archive
2020-07-05T16:30:01.8223583Z   [sphinx-3.0.4.p0]       archive.extractall(members=archive.names)
2020-07-05T16:30:01.8223861Z   [sphinx-3.0.4.p0]     File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 96, in extractall
2020-07-05T16:30:01.8224117Z   [sphinx-3.0.4.p0]       **kwargs)
2020-07-05T16:30:01.8224323Z   [sphinx-3.0.4.p0]     File "/usr/lib/python3.6/tarfile.py", line 2010, in extractall
2020-07-05T16:30:01.8224793Z   [sphinx-3.0.4.p0]       numeric_owner=numeric_owner)
2020-07-05T16:30:01.8225151Z   [sphinx-3.0.4.p0]     File "/usr/lib/python3.6/tarfile.py", line 2052, in extract
2020-07-05T16:30:01.8225442Z   [sphinx-3.0.4.p0]       numeric_owner=numeric_owner)
2020-07-05T16:30:01.8225898Z   [sphinx-3.0.4.p0]     File "/cygdrive/d/a/sage/sage/build/bin/../sage_bootstrap/uncompress/tar_file.py", line 122, in _extract_member
2020-07-05T16:30:01.8226166Z   [sphinx-3.0.4.p0]       **kwargs)
2020-07-05T16:30:01.8226601Z   [sphinx-3.0.4.p0]     File "/usr/lib/python3.6/tarfile.py", line 2122, in _extract_member
2020-07-05T16:30:01.8226883Z   [sphinx-3.0.4.p0]       self.makefile(tarinfo, targetpath)
2020-07-05T16:30:01.8227488Z   [sphinx-3.0.4.p0]     File "/usr/lib/python3.6/tarfile.py", line 2163, in makefile
2020-07-05T16:30:01.8227940Z   [sphinx-3.0.4.p0]       with bltn_open(targetpath, "wb") as target:
2020-07-05T16:30:01.8228249Z   [sphinx-3.0.4.p0]   UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 45: ordinal not in range(128)
2020-07-05T16:30:01.8228542Z   [sphinx-3.0.4.p0]   ************************************************************************
2020-07-05T16:30:01.8228988Z   [sphinx-3.0.4.p0]   Error: failed to extract /cygdrive/d/a/sage/sage/upstream/Sphinx-3.0.4.tar.gz
2020-07-05T16:30:01.8229264Z   [sphinx-3.0.4.p0]   ************************************************************************
2020-07-05T16:30:01.8229537Z   [sphinx-3.0.4.p0] Full log file: /cygdrive/d/a/sage/sage/logs/pkgs/sphinx-3.0.4.p0.log
dimpase commented 4 years ago
comment:16

could it be that locale on Cygwin is not installed by default?

However, https://www.cygwin.com/cygwin-ug-net/setup-locale.html says:

Note
For a list of locales supported by your Windows machine, use the new locale -a command, which is part of the Cygwin package. For a description see locale(1)
kliem commented 4 years ago
comment:17

Replying to @kliem:

I started test runs:

https://github.com/kliem/sage/pull/20/checks

I just started rerunning those tests on top of the current beta. Maybe that stuff just goes away by itself.

kliem commented 4 years ago
comment:18

Still causes this error.

antonio-rojas commented 4 years ago
comment:19

If the centos issue is caused by the sphinx upgrade (according to #30008), why is it blocking this? This is meant to fix another (very annoying) issue on Arch.

kliem commented 4 years ago
comment:20

It appears that #30008 fixed itself. However, this here broke the cygwin sphinx build, last I checked.

It have no clue what is going on, but with this ticket we go from passing to failing.

mkoeppe commented 4 years ago
comment:21

It seems a default setting on Cygwin is LANG=en_US.UTF-8. Perhaps we can try to only set LC_ALL if LANG is not already set or something like this.

mkoeppe commented 4 years ago
comment:22

Also it should be investigated whether it was really necessary to add this line in #29033 to achieve Python 3.6 support. In particular note that sage-uncompress-spkg uses sage-system-python (which can even be python2) -- which really has nothing to do with Python 3.6 support (which is about PYTHON_FOR_VENV).

orlitzky commented 4 years ago
comment:25

What problems arise if we drop the locale mangling entirely? Trac #15791 doesn't mention a problem.

mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -1,2 +1,5 @@
 In #29033 in `build/bin/sage-spkg` LC_ALL was changed to C.UTF-8
 However, not all systems have it. 
+
+There are also some other locale problems that show up in doctests
+(for example https://groups.google.com/d/msg/sage-release/spalYgXKr-4/ZVsbgHIlAgAJ)
mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -3,3 +3,7 @@

 There are also some other locale problems that show up in doctests
 (for example https://groups.google.com/d/msg/sage-release/spalYgXKr-4/ZVsbgHIlAgAJ)
+
+
+See also:
+- #22659
mwageringel commented 4 years ago
comment:30

I am not sure if this is related, but while compiling Cypari on macOS, every file gives a warning of this type:

     Colperl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
        LC_ALL = "C.UTF-8",
        LC_TERMINAL = "iTerm2",
        LANG = "de_DE.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to a fallback locale ("de_DE.UTF-8").

The fallback that is used seems to work for me, though.

orlitzky commented 4 years ago
comment:31

I just did a fresh build with LC_ALL=C and see no outstanding problems (python-3.7.8).

Maybe we should just revert that line? Why rock the boat?

Erik is the only other person who might know why it was added.

mkoeppe commented 4 years ago
comment:32

This will need testing on centos-8 with Python 3.6

embray commented 4 years ago
comment:33

Replying to @orlitzky:

I just did a fresh build with LC_ALL=C and see no outstanding problems (python-3.7.8).

Maybe we should just revert that line? Why rock the boat?

No. This was added for reasons. Specifically to ensure compatibility between how Python 3.6 and Python 3.7 set the default encoding. Without this, there were bugs on Python 3.6 with Python not using a unicode character encoding by default. See https://www.python.org/dev/peps/pep-0538/

The simplest way to deal with this problem for currently released versions of CPython is to explicitly set a more sensible locale when launching the application. For example:

LC_CTYPE=C.UTF-8 python3 ...

The C.UTF-8 locale is a full locale definition that uses UTF-8 for the LC_CTYPE category, and the same settings as the C locale for all other categories (including LC_COLLATE). It is offered by a number of Linux distributions (including Debian, Ubuntu, Fedora, Alpine and Android) as an alternative to the ASCII-based C locale. Some other platforms (such as HP-UX) offer an equivalent locale definition under the name C.utf8.

Mac OS X and other *BSD systems have taken a different approach: instead of offering a C.UTF-8 locale, they offer a partial UTF-8 locale that only defines the LC_CTYPE category. On such systems, the preferred environmental locale adjustment is to set LC_CTYPE=UTF-8 rather than to set LC_ALL or LANG.

Perhaps this should also try the LC_CTYPE=UTF-8 mentioned here. Otherwise Dima's approach makes sense, though it can't guarantee that everything will just work on those systems. I can't recall exactly what broke without it but I do recall there was something. With Python 3.7 (the default when not using the system Python) this shouldn't be a problem since Python will basically force a UTF-8 locale for itself.

dimpase commented 4 years ago
comment:34

Replying to @embray:

Perhaps this should also try the LC_CTYPE=UTF-8 mentioned here. Otherwise Dima's approach makes sense, though it can't guarantee that everything will just work on those systems. I can't recall exactly what broke without it but I do recall there was something. With Python 3.7 (the default when not using the system Python) this shouldn't be a problem since Python will basically force a UTF-8 locale for itself.

How about only doing this export LC_...= on Python3.x with x<7 ? This should in particular make Arch people (apparently Arch has no C.UTF-8 or a similar locale, everything UTF-8 there is language-specific) happy, as their Python is new enough.

orlitzky commented 4 years ago
comment:35

Ok, thanks for the information. I think the major take-away from PEP538 is,

With this change, any *nix platform that does not offer at least one of the C.UTF-8, C.utf8 or UTF-8 locales as part of its standard configuration would only be considered a fully supported platform for CPython 3.7+ deployments when a suitable locale other than the default C locale is configured explicitly (e.g. en_AU.UTF-8, zh_CN.gb18030).

I'm pretty sure we have files that actually need the UTF-8 encoding by now, so that rules out the possibility of "doing nothing" (leaving LC_ALL=C or LC_CTYPE=C) on python-3.6. And if we want to make python-3.6 work the way that python-3.7 does, then we're in the same situation as upstream is with respect to C.UTF-8: we have to consider python-3.6 with no C.UTF-8 (or equivalent) unsupported.

So I see two real options left:

  1. Try to set the locale to C.UTF-8 or C.utf8 or UTF-8 when python-3.6 is being used, and declare the system unsupported if we can't. If python-3.7+ is being used, we can set the locale to C, and it will coerce the locale to something utf8ish on its own. In either case, a lack of utf8 locale would be unsupported.
  2. Don't set a locale at all, and rely on the distribution/user to set a utf8 locale by default. This would result in some confusing grep/sort behavior (they're locale-dependent), but maybe we could be extra careful in our SPKGs to work around that, so that e.g. en_US.UTF-8 would work too.

Long-term, as one of the largest python projects in existence, I think we probably have to suck it up and go with (1), even though it pains me to require a locale that glibc doesn't even ship and isn't POSIX. Whatever decisions python makes, we're stuck with.

orlitzky commented 4 years ago
comment:36

Replying to @dimpase:

How about only doing this export LC_...= on Python3.x with x<7 ? This should in particular make Arch people (apparently Arch has no C.UTF-8 or a similar locale, everything UTF-8 there is language-specific) happy, as their Python is new enough.

I think a combination of this and the current branch is the best we can do. On python-3.6, we should try to set LC_ALL to C.UTF-8, C.utf8, or UTF-8. If we can't, then we should leave it alone and pray that the user's locale is compatible with all of our SPKGs. That situation would be unsupported by sage.

On python-3.7+, we can set LC_ALL=C, and python itself will try to pick an appropriate UTF-8 version of the locale. What does python on arch do in this situation? It's possible that python itself will fail to find a suitable UTF-8 locale, but there's not a lot we can do if upstream python insists on a nonstandard locale. Arch will just have to reconsider their decision unless they want to be not-fully supported by upstream python-3.7+.

embray commented 4 years ago
comment:37

Personally I don't care what glibc or POSIX say on this. I think 1) is a fine option.

orlitzky commented 4 years ago
comment:38

Apparently we carry the C.UTF-8 patch in Gentoo for systemd, who definitely don't care about portability:

https://github.com/systemd/systemd/pull/10742

mkoeppe commented 4 years ago
comment:39

Is anyone working on this?

vbraun commented 4 years ago
comment:40

Sage-the-python-library can't realistically support anything that CPython does not; Its 2020, who in their right mind doesn't support utf-8? Better diagnostics for non-compliant systems would be great but imho not a blocker.

orlitzky commented 4 years ago
comment:41

Replying to @vbraun:

Sage-the-python-library can't realistically support anything that CPython does not; Its 2020, who in their right mind doesn't support utf-8? Better diagnostics for non-compliant systems would be great but imho not a blocker.

These systems do support UTF-8, but not the (as of yet) non-standard C.UTF-8 locale.

The current branch has the right idea, but since python-3.6 and python-3.7 act differently, it can be made a bit more precise. With python-3.7+, we can set LC_ALL=C and let python do the guessing. (Maybe it doesn't succeed, but officially Not Our Problem at that point.) With python-3.6, we can check for the C.UTF-8 locale and set it when found, with a fallback to LC_ALL=C. The current branch does this unconditionally but, it should only do it for python-3.6 and we should check the other equivalent names C.utf8 and UTF-8 too.

I think it's worthwhile to not output a million scary error messages in the sage-9.2 release on these systems that have done nothing wrong. At the very least, we owe it to the Arch maintainers who do a lot for sage and would have to field the resulting bug reports (or patch this themselves). I'm sure there are BSDs where this is problematic too.

dimpase commented 4 years ago
comment:42

Arch locales maintainers just need to get C.UTF-8 locale, they are being silly (their argument - "it's an evil coming from Debian" - and I'm told they don't reopen the corresponding issue, as it's "decided". Meanwhile everybody else has C.UTF-8 locale, it's just them who don't)

mkoeppe commented 4 years ago
comment:43

It's a blocker because it is a regression regarding platform support.

Can we please get a fix done?

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e5f6663only use locale C.UTF-8 if available, else C
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 4 years ago

Changed commit from 37e042c to e5f6663

dimpase commented 4 years ago
comment:45

rebased over the latest beta

mkoeppe commented 4 years ago

Description changed:

--- 
+++ 
@@ -4,6 +4,17 @@
 There are also some other locale problems that show up in doctests
 (for example https://groups.google.com/d/msg/sage-release/spalYgXKr-4/ZVsbgHIlAgAJ)

+And a failure building the documentation on `ubuntu-bionic-standard` (using `/usr/bin/python3.6`, https://github.com/mkoeppe/sage/runs/1106251169):
+
+```
+  [dochtml]   UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2661: ordinal not in range(128)
+  [dochtml] Full log file: logs/dochtml.log
+Makefile:1876: recipe for target 'doc-html' failed
+```
+
+
+
+

 See also:
 - #22659
seblabbe commented 4 years ago
comment:47

Using a freshly installed Ubuntu 18.04 (bionic) and with some french settings set somewhere so that $ git pull returns Déjà à jour, a french equivalent for Already up to date, running make on 9.2.beta12 yields the [dochtml] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2661: ordinal not in range(128) error, see this post, right from the start of the build of the documentation.

With beta12 + current branch, I still get the same error after make doc-clean && make.

seblabbe commented 4 years ago
comment:48

That being said, I now see where it comes from:

sage: with open('src/doc/en/reference/references/index.rst', 'r') as f: s = f.read()                              
sage: s[2600:2700]                                                                                   
' characteristic,* The Open Book Series, vol. 2, no. 1, pp. 37–53, Jan. 2019.\n\n.. [ABZ2007] \\R. Aharo'
sage: s[2661:2700]                                                                                   
'–53, Jan. 2019.\n\n.. [ABZ2007] \\R. Aharo'

The character has many occurrences in that file and is possibly not the only occurrence of a nonascii character (for example names of authors...). So, I am not sure replacing them by -- is the correct fix. And I don't see how this is related at all with the C.UTF-8 configuration.

dimpase commented 4 years ago
comment:49

So your machine has no C.UTF-8 locale installed, right?

seblabbe commented 4 years ago
comment:50

How can I figure this out? If it can help, I have this:

$ locale
LANG=fr_CA.UTF-8
LANGUAGE=fr_CA:fr_FR:en_GB:en
LC_CTYPE="fr_CA.UTF-8"
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_COLLATE="fr_CA.UTF-8"
LC_MONETARY=fr_FR.UTF-8
LC_MESSAGES="fr_CA.UTF-8"
LC_PAPER=fr_FR.UTF-8
LC_NAME=fr_FR.UTF-8
LC_ADDRESS=fr_FR.UTF-8
LC_TELEPHONE=fr_FR.UTF-8
LC_MEASUREMENT=fr_FR.UTF-8
LC_IDENTIFICATION=fr_FR.UTF-8
LC_ALL=
slabbe@miami ~ $ man locale
slabbe@miami ~ $ locale -a
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IL
en_IL.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
fr_BE.utf8
fr_CA.utf8
fr_CH.utf8
french
fr_FR
fr_FR.iso88591
fr_FR.utf8
fr_LU.utf8
POSIX
dimpase commented 4 years ago
comment:51

in

+if test x`locale -a | grep C\.UTF-8` != x; then
+ export LC_ALL=C.UTF-8;
+else
+ export LC_ALL=C;
+fi

bit of this branch, could you change both LC_ALL to LC_CTYPE and try if it helps?

seblabbe commented 4 years ago
comment:52

Replying to @dimpase:

bit of this branch, could you change both LC_ALL to LC_CTYPE and try if it helps?

Same error after make doc-clean and make.

seblabbe commented 4 years ago
comment:53

To me, it seems like an error of the following kind. That is we are opening the src/doc/en/reference/references/index.rst file as a bytes type, and at some place (where?), we decode the bytes to ascii and then we get UnicodeDecodeError because it is not ascii at all. Here is a way to reproduce the same error message:

sage: with open('src/doc/en/reference/references/index.rst', 'rb') as f: b = f.read()                
sage: b.decode('ascii')                                                                              
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-13-498050d5a3fb> in <module>
----> 1 b.decode('ascii')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 2661: ordinal not in range(128)
sage: s = b.decode('utf-8') 

I am adding Frédéric in cc since he fixed a lot of those UnicodeDecodeError in recent times with the passage to Python 3.

Also, why this works in Python3.8 and not in Python3.6 ?