twosigma / beakerx

Beaker Extensions for Jupyter Notebook
http://BeakerX.com
Apache License 2.0
2.8k stars 382 forks source link

Plots with some unicode symbols in title are not displayed. #5983

Closed altavir closed 7 years ago

altavir commented 7 years ago

Very bizarre error. Some of plots in my old beaker notebook are not displayed (no error, in fact no output cell at all). I found that the ones that are not displayed are those that have some specific Cyrillic letters in the title. If I avoid specific letters like с or я, everything works fine.

To reproduce try:

new Plot(title: "я")

The system locale is Windows-1252.

EDIT There is in fact a error stack trace in the console:

Traceback (most recent call last):
      File "c:\anaconda\lib\site-packages\tornado\web.py", line 1467, in _stack_context_handle_exception
        raise_exc_info((type, value, traceback))
      File "<string>", line 4, in raise_exc_info
      File "c:\anaconda\lib\site-packages\tornado\stack_context.py", line 316, in wrapped
        ret = fn(*args, **kwargs)
      File "c:\anaconda\lib\site-packages\zmq\eventloop\zmqstream.py", line 191, in <lambda>
        self.on_recv(lambda msg: callback(self, msg), copy=copy)
      File "c:\anaconda\lib\site-packages\notebook\services\kernels\handlers.py", line 299, in _on_zmq_reply
        msg = self.session.deserialize(fed_msg_list)
      File "c:\anaconda\lib\site-packages\jupyter_client\session.py", line 926, in deserialize
        message['content'] = self.unpack(msg_list[4])
      File "c:\anaconda\lib\site-packages\jupyter_client\session.py", line 105, in <lambda>
        json_unpacker = lambda s: jsonapi.loads(s)
      File "c:\anaconda\lib\site-packages\zmq\utils\jsonapi.py", line 54, in loads
        s = s.decode('utf8')
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 172: invalid continuation byte
scottdraves commented 7 years ago

thanks for the report, will try to figure it out. i can't reproduce in my own locale, but hopefully we can figure it out.

rbidas commented 7 years ago

@altavir Could You provide to me:

I try reproduce this on my environment Linux (ubuntu) and Windows 10 and it works.

altavir commented 7 years ago

Here is a gist of it: https://nbviewer.jupyter.org/gist/anonymous/c42422a06c578100d0eda878ff7ccf24 First cell does not work, but second one works fine.

And here is the result of the command:

>conda list ipython
# packages in environment at C:\Anaconda:
#
ipython                   5.3.0                    py36_0
ipython                   6.2.0                     <pip>
ipython_genutils          0.2.0                    py36_0
rbidas commented 7 years ago

@altavir Could You update ipython to version 5.4 or higher There was a bug https://github.com/ipython/ipython/pull/10558 in ipython which was fix in 5.4. ipnyb files are save in UTF-8 but read using system locale.

In use ipython in version 6.1.0 and plots works fine.

altavir commented 7 years ago

I am a bit confused because I don't understand why pip uses one version and conda uses another one. Still, I upgraded ipython in conda:

> conda list ipython
# packages in environment at C:\Anaconda:
#
ipython                   6.2.0                     <pip>
ipython                   6.1.0                    py36_0
ipython_genutils          0.2.0                    py36_0

But I still get the same error. I do not think that it is the problem you have described:

  1. If works fine with some Cyrillic symbols. All Cyrillic symbols are 2-bite in UTF-8, so it should be broken for each of them.
  2. I do not save and reload the notebook. I just create it in-memory.
altavir commented 7 years ago

It seems to me that it is not a file reading problem, but miscommunication between JVM and python. My own experience shows, that when working with Java or Groovy, one should never use default string to byte array conversion and vise versa. You should always provide explicit encoding.

rbidas commented 7 years ago

OK, I will try to reproduce this. Could You made one test? Please run this in python3 notebook

from beakerx import *
Plot(title= "я")
altavir commented 7 years ago

I should have thought about it myself. Silly me. Well, it works just fine with python, so it proves my point that the problem is somewhere inside java serialization.

rbidas commented 7 years ago

Thanks, this will help us find the problem. It's only Groovy problem or other JVM kernels are also affected?

altavir commented 7 years ago

Tried kotlin from another computer. The same problem. Additionally, Cyrillic letters that do work are displayed as ? (they were displayed correctly last time I've tried on Groovy, but it was on another computer with the same configuration).

altavir commented 7 years ago

I've fixed groovy kernel by manually adding -Dfile.encoding=UTF8 to kernel.json. The problem is indeed in java default charset. Probably it is easier to specify encoding in java parameters instead of modifying the code.

michalgce commented 7 years ago

Problem is related with encoding, to solve this problem we have two choices one is adding as @altavir said parameter which setup encoding (because default is local encoding), and the second would be a catching incoming code casting it to bytes and then encoding to UTF8 but im not sure if that approach works correctly.

scottdraves commented 7 years ago

Thanks for the clue altavir! I prefer fixing it in the code. Somewhere we are using String.getBytes(), InputStreamReader or OutputStreamWriter without specifying UTF8...

michalgce commented 7 years ago

@altavir could you tell me what is your output for this command (groovy/java) kernel

import java.nio.charset.Charset;
System.out.println("Default Charset=" + Charset.defaultCharset());
altavir commented 7 years ago

It should be Windows-1252... Indeed it is:

jshell> System.out.println("Default Charset=" + Charset.defaultCharset());
Default Charset=windows-1252
michalgce commented 7 years ago

@altavir Unfortunately, we are not able to reproduce your problem. Are you able to test this brunch and tell me how it's works https://github.com/twosigma/beakerx/tree/michal/5983 ?

scottdraves commented 7 years ago

@altavir ping?

altavir commented 7 years ago

Sorry, I was away for a week. I think I managed to do developer install for this branch. But problem seems be there. I installed everything into separate anaconda environment and run it from there. Both Groovy and Kotlin produce an error. Kotlin also generates additional warning:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.intellij.util.text.StringFactory (file:/D:/temp/beakerx/beakerx/beakerx/static/kernel/kotlin/lib/kotlin-compiler-1.1.3.jar) to constructor java.lang.String(char[],boolean)
WARNING: Please consider reporting this to the maintainers of com.intellij.util.text.StringFactory
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release