vispy / jupyter_rfb

Remote Frame Buffer for Jupyter
https://jupyter-rfb.readthedocs.io
MIT License
60 stars 12 forks source link

feat: avoid base64-encoding image data #76

Closed manzt closed 1 year ago

manzt commented 1 year ago

Implements #75. Uses Jupyter Widgets support for sending binary data directly over websockets.

manzt commented 1 year ago

Also one question: would this also work in other clients like Colab (https://github.com/vispy/jupyter_rfb/issues/74) or MS Code?

Yes. VS Code and Colab correctly implement the Jupyter Widgets, it's just there are a lot of quirks you need to worry about as a widget author to make sure your front-end code is packaged correctly and discoverable in those environments.

Here's an example notebook using https://github.com/manzt/anywidget – a project I work on that handles those complexities for you – that sends binary data in Colab.

almarklein commented 1 year ago

One linting error: ./jupyter_rfb/widget.py:17:1: F401 'base64.encodebytes' imported but unused 😉

And yarn seems to fail (on some builds).

manzt commented 1 year ago

And yarn seems to fail (on some builds).

yarn builds are failing because you don't have a lock file and one of the dependencies is now incompatible with node v14. I bumped the version in CI, but you might want to consider checking in a lockfile.

djhoese commented 1 year ago

I didn't find too much on previous conversations on this topic so let me know if I'm missing something that I can read in a different thread: The related issues talks about JPEG only, but this PR sends "raw" JPEG and PNG, right? Unless I'm mis-remembering, I think base64 was originally used because that's what vispy used to use in the old webgl backend that this package was originally created to support. This PR is basically just saying "we can send binary data, so send binary data", right?

almarklein commented 1 year ago

The related issues talks about JPEG only, but this PR sends "raw" JPEG and PNG, right?

Yes.

I think base64 was originally used because that's what vispy used to use in the old webgl backend that this package was originally created to support.

I used base64 because its a common way to be able to send binary data over a textual interface. And in the case of images, because it allows creating an url that embeds the image (so no decoding necessary in JS). I've seen/used the same technique in similar situations, and I guess I hadn't thought about using the ability to send binary data over the websocket ...

But now that I think of it ... are we certain that the websocket is really in binary mode, and that Jupyter is not base64 encoding the data anyway (and decoding in the browser). Can we observe an increase in performance with this change (e.g. in performance.ipynb)?

manzt commented 1 year ago

But now that I think of it ... are we certain that the websocket is really in binary mode, and that Jupyter is not base64 encoding the data anyway (and decoding in the browser).

Fairly certain it sends the binary data and there is no hidden base64 encoding. cc: @maartenbreddels

Also not familiar with being able to turn off/on binary support for a WebSocket. WebSocket.binaryType property controls the type of binary data being received over the connection, but the only options are blob and arrayBuffer.

https://github.com/jupyterlab/jupyterlab/blob/c68fa8e2df6d15d6329d6337ac30f520e371a620/packages/services/src/kernel/default.ts#L1276

almarklein commented 1 year ago

I picked this up, eventually making this work with both approaches.

I ran this many times on the examples/performance.ipynb. Now this may not be a complete benchmark, and I only tested on localhost and MacOS .... but ... what I see is that using a websocket is actually a bit slower. I was not expecting that, and its not a lot, but its significant.

My guess is that the object-url's cause unexpected overhead or something. I also saw that the time spent on base64 encoding is much smaller (like an order of magnitude) than the jpeg compression.

Leaving this open for now in case someone else wants to have a stab.

djhoese commented 1 year ago

I also saw that the time spent on base64 encoding is much smaller (like an order of magnitude) than the jpeg compression.

This is a function of the size of the frame/image, right? The performance notebook is 480x640 images. I'd also assume this is heavily dependent on the load of the server. So I guess my questions are:

  1. How big of a frame do you have to make for base64 to be on-par with JPEG compression?
  2. What is the difference in amount of data sent over the network? Obviously locally isn't too big of a deal, but over a non-local network the UX could be different.
almarklein commented 1 year ago

Good questions! Also worth looking at how things work out with png. I'll have another look soon.

almarklein commented 1 year ago

what I see is that using a websocket is actually a bit slower. I was not expecting that, and its not a lot, but its significant.

Sorry, my above comment was based on very unrealistic tests.

Did a bit more testing. The size of the frames does not seem to matter much. What does matter a lot is when it sends random data instead. Then the websocket approach becomes twice as fast 😄. Although fully random data is not realistic, a nearly black image is neither, most real data will be somewhere in between.

I'm now confident that in general the websockt approach is faster, especially in cases where the server is remote. I've left both options open, so in case one is in a situation where a websocket is problematic, it can still be turned off. By default it's on though.

Ready to go from my end.