Open oryxius opened 6 years ago
Thanks! Let's see if we can narrow down the problem to which part of the ecosystem is causing the issue. It appears that this happens even in ipython 6.2.1 (python 3.6.4):
Python 3.6.4 | packaged by conda-forge | (default, Dec 23 2017, 16:54:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: en = '7X'
...: print (en)
...: ar = 'عربي'
...: print (ar)
...: print ([en, ar])
...: print ([ar, en])
...:
7X
عربي
['7X', 'عربي']
['عربي', '7X']
So it looks like it's not JupyterLab (this repo), but something much more fundamental. In fact, trying with pure python (i.e., no Jupyter involved) also gives the issue for me.
Python 3.6.4 | packaged by conda-forge | (default, Dec 23 2017, 16:54:01)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> en = '7X'
>>> print (en)
7X
>>> ar = 'عربي'
>>> print (ar)
عربي
>>> print ([en, ar])
['7X', 'عربي']
>>> print ([ar, en])
['عربي', '7X']
If you try this in pure python at the command line, does it also give the problem for you? If so, it sounds like it is a much deeper issue with the language, not with Jupyter.
If it is a python problem, it's best to report it to the python issue tracker: https://bugs.python.org/
Hang on, when I try it in both python and IPython, I get the expected result. When I try it in the classic notebook or JupyterLab, I get the incorrect result.
So on my machine, at least, it is a notebook problem.
Thanks for confirming! What version of python are you using? I'm using the conda-forge 3.6.4-0 package for macOS, in the OS X terminal (in case that has anything to do with it).
I am using 3.6.0 at the moment, on Ubuntu.
Are you using the same python that the notebook and lab are using? Or are you using the system python in one case and a different python in the other case?
It's the same conda-installed python.
In contrast, when I use the macOS system python, things appear to work correctly:
Python 2.7.11 (default, Dec 26 2015, 17:47:15)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> en = '7X'
>>> print (en)
7X
>>> ar = 'عربي'
>>> print (ar)
عربي
>>> print ([en, ar])
['7X', '\xd8\xb9\xd8\xb1\xd8\xa8\xd9\x8a']
>>> print ([ar, en])
['\xd8\xb9\xd8\xb1\xd8\xa8\xd9\x8a', '7X']
@ian-r-rose - what exact python conda package are you using? (channel, build, etc.)
So you're using the Anaconda python package, not the conda-forge one? Can you try with the conda-forge one?
Still works fine with the conda-forge one:
Very weird, then. @oryxius - what do you see with python, and what is your OS and python package?
I can't read Arabic, but it does appear that the Arabic letters in @ian-r-rose's screenshot are different than in the example above. @ian-r-rose has
but the example in the original post has
(Also, @ian-r-rose, I noticed that your terminal is somewhat transparent, and I can read what's underneath - maybe useful for you to keep in mind when posting screenshots :).
Yeah, I noticed that, but since it's just this issue :)
@ian-r-rose how did you get the Arabic characters in your screenshot? They look totally different (but I don't know - maybe they are actually the same?)
I copied and pasted from the first post. I don't know how to get them otherwise.
I should note that when I paste them into the notebook interface they look the same as the initial screenshot, so...maybe flaky font rendering?
After some more digging:
I am wondering whether this is a browser bug: the message coming in from the websocket connection looks okay to me. But then when it is parsed it winds up wrong. In the normal JS console on Firefox, if I enter
JSON.parse("{ \"content\": [\"\u0639\u0631\u0628\u064a\", \"7X\"] }")
I get
content: Array [ "عربي", "7X" ]
Edit: this reproduces the issue in both Chrome and Firefox:
JSON.parse("\"'\u0639\u0631\u0628\u064a', '7X'\"")
@ian-r-rose - interestingly, your browser console experiment shows the bug in Firefox, but not in Chrome, for me.
Me too, though in Chrome the whole message still gets parsed incorrectly (for reasons I have not figured out)
Also, I have no idea why your python interpreter is also showing this bug, @jasongrout
This is getting more and more weird. It seems that there are bugs across multiple applications regarding this.
@jasongrout & @ian-r-rose Thank you both for the quick response. First, I am using this on Microsoft Surface Studio running Windows 10. I get the problem on both Jupyter Classic and JupyterLab which I have in WinPython-64bit-3.6.3.0Qt5. The code prints correctly on WinPython's Spyder IDE.
@jasongrout Yes, the Arabic in ian-r-rose does not display accurately. It is basically displaying the Arabic letters of the word in reverse order (or so it appears to the viewer) and in their unconnected forms. Normally this is a Unicode support issue in simple text terminals.
I wonder if, ironically, the lack of proper support for display is what makes it work.
Also, I tested it in four browsers: Chrome, Firefox, Edge, and Opera and it appears in all of them.
Yeah, I wonder if they got reversed in my copy-paste buffer.
It seems to me like this is indeed a RTL vs LTR error, specifically which parts of the string get assigned LTR and which get assigned RTL (cf. discussion here and here). If I enter
JSON.parse("\"\u0639X7\"")
it displays "عX7"
as expected (or, at least, how I as an English speaker would expect). If, however, I enter
JSON.parse("\"\u06397X\"")
it displays "ع7X"
. That is to say, the numeric character gets assigned to the RTL portion of the string. In the original example, the browser string parser was not knowledgeable enough about Python syntax to know to assign the 7
to its list member, rather than the other one.
At least, this is my guess about what is going on. As for a fix, this seems really tough, since it is happening at a very low level in browser unicode support. I fear that fixes we would try would end up breaking other things.
Here is how it displays in Spyder: The console basically aligns to the right any string or list that begins in Arabic.
cc @minrk and @Carreau , who have a deeper knowledge of Unicode than I.
And CC also @afshin, who can read Arabic, and @samarsultan, who did lots of work on bidi in the classic Notebook (e.g., https://github.com/jupyter/notebook/pull/2357, https://github.com/jupyter/notebook/issues/2178, https://github.com/jupyter/notebook/issues/2156)
Ouch just getting the ping and this one is nasty. The other thing we need to test is whether it works correctly when flipping the classic notebook interface to use RTL layout (which you can do in the command palette).
it looks like it's caused by mixed typesetting,refer: https://www.w3.org/TR/alreq/#h_direction " Arabic script is written from right to left. Numbers, even Arabic numbers, are written from left to right, as is text in a script that is normally left-to-right. When the main script is Arabic, the layout and structure of pages and documents are also set from right to left. "
all data is OK but the JavaScript object toString function return wrong
Hello everyone,
I am a computational linguist who uses Python and Jupyter to work on Arabic. Along with Spyder, I found it to be the best IDE for my purposes. Of course, the Markdown gives Jupyter a clear edge. But I recently ran into an issue when printing lists (and tuples and dictionaries) that contain Arabic string elements and alphanumeric elements. So for example if you run:
Somehow, if the alphanumeric ends with letter(s), these jump and get displayed with the Arabic element. I have posted this issue at Stackoverflow, but haven't received any solutions so far.