jupyter / notebook

Jupyter Interactive Notebook
https://jupyter-notebook.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
11.59k stars 4.86k forks source link

Notebook output display wrong when using certain UTF8 characters #5767

Open jotterbach opened 3 years ago

jotterbach commented 3 years ago

When creating a string like

weird_string = '[" \\u0635\\u063a\\u064a\\u0631",7647]\n'
eval(weird_string)

the notebook display's output is ill-formated. However the underlying data-structure is intact (i.e. the string is the 0 element and the integer is the 1 element). This only happens for some output, and not for UTF-8 characters closer to Latin alphabets etc. Running the command in the underlying IPython kernel results in the expected output (see images attached below)

Actual: Misformated output of the string image

Expected: Same output as in IPython kernel directly image

Version

>$ jupyter --version
jupyter core     : 4.6.3
jupyter-notebook : 6.0.3
qtconsole        : 4.7.4
ipython          : 7.15.0
ipykernel        : 5.3.0
jupyter client   : 6.1.3
jupyter lab      : not installed
nbconvert        : 5.6.1
ipywidgets       : 7.5.1
nbformat         : 5.0.6
traitlets        : 4.3.3
kevin-bates commented 3 years ago

Hmm - this seems like some kind of RTL configuration is creeping in here. I see what the notebook is showing in Python 3.8 (also 3.6 and 3.5) as well as IPython 7.15. Might you have an RTL locale configured? What behavior do you see from a pure Python REPL?

$ python
Python 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:19) 
[Clang 9.0.0 (tags/RELEASE_900/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> weird_string = '[" \\u0635\\u063a\\u064a\\u0631",7647]\n'
>>> eval(weird_string)
[' صغير', 7647]
>>> quit()

$ IPython
Python 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:19) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: weird_string = '[" \\u0635\\u063a\\u064a\\u0631",7647]\n' 
   ...: eval(weird_string)                                                                                                                                                          
Out[1]: [' صغير', 7647]