jupyter-widgets / ipywidgets

Interactive Widgets for the Jupyter Notebook
https://ipywidgets.readthedocs.io
BSD 3-Clause "New" or "Revised" License
3.17k stars 950 forks source link

FileUpload does not work on 2.7 #2533

Open mlucool opened 5 years ago

mlucool commented 5 years ago

ipywidgets still supports 2.7, but the FileUpload widget does not work due to what looks like unicode vs. byte differences.

When I upload any file, I see:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/usr/local/python/python-2.7/std/lib/python2.7/site-packages/ipywidgets/widgets/widget.py in _handle_msg(self, msg)
    674                 if 'buffer_paths' in data:
    675                     _put_buffers(state, data['buffer_paths'], msg['buffers'])
--> 676                 self.set_state(state)
    677 
    678         # Handle a state request.

/usr/local/python/python-2.7/std/lib/python2.7/site-packages/ipywidgets/widgets/widget.py in set_state(self, sync_data)
    543                     from_json = self.trait_metadata(name, 'from_json',
    544                                                     self._trait_from_json)
--> 545                     self.set_trait(name, from_json(sync_data[name], self))
    546 
    547     def send(self, content, buffers=None):

/opt/python/python-2.7/lib64/python2.7/contextlib.py in __exit__(self, type, value, traceback)
     22         if type is None:
     23             try:
---> 24                 self.gen.next()
     25             except StopIteration:
     26                 return

/usr/local/python/python-2.7/std/lib/python2.7/site-packages/traitlets/traitlets.py in hold_trait_notifications(self)
   1129                 for changes in cache.values():
   1130                     for change in changes:
-> 1131                         self.notify_change(change)
   1132 
   1133     def _notify_trait(self, name, old_value, new_value):

/usr/local/python/python-2.7/std/lib/python2.7/site-packages/ipywidgets/widgets/widget.py in notify_change(self, change)
    601         if self.comm is not None and self.comm.kernel is not None:
    602             # Make sure this isn't information that the front-end just sent us.
--> 603             if name in self.keys and self._should_send_property(name, getattr(self, name)):
    604                 # Send new state to front-end
    605                 self.send_state(key=name)

/usr/local/python/python-2.7/std/lib/python2.7/site-packages/ipywidgets/widgets/widget.py in _should_send_property(self, key, value)
    652             # idiosyncracies of how python data structures map to json, for example
    653             # tuples get converted to lists.
--> 654             if (jsonloads(jsondumps(split_value[0])) == split_lock[0]
    655                 and split_value[1] == split_lock[1]
    656                 and _buffer_list_equal(split_value[2], split_lock[2])):

/opt/python/python-2.7/lib64/python2.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, encoding, default, sort_keys, **kw)
    242         cls is None and indent is None and separators is None and
    243         encoding == 'utf-8' and default is None and not sort_keys and not kw):
--> 244         return _default_encoder.encode(obj)
    245     if cls is None:
    246         cls = JSONEncoder

/opt/python/python-2.7/lib64/python2.7/json/encoder.py in encode(self, o)
    205         # exceptions aren't as detailed.  The list call should be roughly
    206         # equivalent to the PySequence_Fast that ''.join() would do.
--> 207         chunks = self.iterencode(o, _one_shot=True)
    208         if not isinstance(chunks, (list, tuple)):
    209             chunks = list(chunks)

/opt/python/python-2.7/lib64/python2.7/json/encoder.py in iterencode(self, o, _one_shot)
    268                 self.key_separator, self.item_separator, self.sort_keys,
    269                 self.skipkeys, _one_shot)
--> 270         return _iterencode(o, 0)
    271 
    272 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: invalid start byte

Is it possible to fix this?

jasongrout commented 5 years ago

I see the error too in python 2.7 when uploading a binary file (uploading a text file seems okay). It seems like the problematic lines are https://github.com/jupyter-widgets/ipywidgets/blob/c592184935781d0787bf622bf1659279c4a8b531/ipywidgets/widgets/widget.py#L654

It's odd - I would have imagined that the lines above it that strip out the binary buffers would have taken care of the issue. Here's a theory: in python 2, strings are the same as binary buffers, so the binary buffers don't get stripped out. However, the json dumps function expects a utf-8 string. So when you transfer a binary buffer from the client to the kernel, it doesn't get stripped out in this check function, so we ask the json library to dump the state, which includes the binary buffer (aka string). That then causes the issue.

If that theory is right, we'd see the same issues with any client-side widget transferring a binary value to the kernel (e.g., images, etc.).

I'm not sure a good way around this. We'd need to distinguish between binary buffers and real strings, and that is exactly one of the major reasons for python 3.

jasongrout commented 5 years ago

It looks like our code is looking for these types in python 2 to strip out: https://github.com/jupyter-widgets/ipywidgets/blob/master/ipywidgets/widgets/widget.py#L59

I'm assuming the problematic thing here is the file's value. If so, why is the file's value not a bytearray or memoryview in python 2?

jasongrout commented 5 years ago

Anyway, hopefully that is some context for whoever wants to dig into this.

We do plan to release ipywidgets 8 before the end of the year, which will be python 3 only.