jupyter-widgets / ipydatagrid

Fast Datagrid widget for the Jupyter Notebook and JupyterLab
BSD 3-Clause "New" or "Revised" License
580 stars 51 forks source link

Improve data serialization #483

Closed martinRenou closed 8 months ago

martinRenou commented 9 months ago

Make ipydatagrid more performant, achieving two things:

What's remaining to make the PR ready to review:

In follow up PRs, the next items should be resolved:

ianthomas23 commented 8 months ago

I have tried this locally and I see the same dramatic speed improvements. It would be good to continue with this as it will be a good basis for experiments in filtering and sorting on the backend that I'd like to look at.

paddymul commented 8 months ago

I have been working just this week to better understand binary serialization from pandas through ipywidgets to js. I think I'm going to use arrow-js. I'm hoping to publish a very rough early repo later today.

I'm currently fleshing out a simple IPYWidget library that lets me prototype simple examples, and it will be easier to collaborate with other people since it's a simple library.

Trevor Manz and Kyle Barron have been doing work in this space too.

I'd love to collaborate with others on this.

paddymul commented 8 months ago

FWIW I just pushed the first commits to the serialization playground df_cereal https://github.com/paddymul/df_cereal

I have examples of arrow-js serialization working entirely in js. I currently can't get the python side to work to communicate bytes or base64 to JS

Benchmarks and more docs coming soon.

BTW I looked at what bqplot is doing. I suspect arrow based serialization will be much faster since it doesn't deal with json at all.

martinRenou commented 8 months ago

Thank you for reaching out @paddymul. This looks interesting!

will be much faster

I'm a tiny bit skeptical about this. The JSON message bqplot sends is minimal in the end.

I feel like we should go ahead with this PR once it's passing all tests. Then I'm 💯 to continue discussing on having a common place for having better binary serialization that we can use across widgets. I don't like depending on bqplot for this, but it was already a dependency for some reason (probably some legacy dependency due to removed code) so it's convenient to just use it for now.