nteract / papermill

📚 Parameterize, execute, and analyze notebooks
http://papermill.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
5.79k stars 423 forks source link

Huge notebook sizes with keplergl-jupyter #802

Open wrichert opened 1 month ago

wrichert commented 1 month ago

🚀 Feature

I'm using papermill to generate KeplerGL HTML files. Papermill seems to persist all the inner states of the kepler map.

kepler_map = KeplerGl(height=700)
kepler_map.config = KEPLER_CONFIG
kepler_map.add_data(data=data, name="data")
kepler_map.save_to_html(file_name = 'kepler.html', read_only=False)

Inspecting the .ipynb file generated by papermill I see this:

 "metadata": {
...
"453e3a2bf8594314abd4080df83485c3": {
      "model_module": "keplergl-jupyter",
      "model_module_version": "^0.3.2",
      "model_name": "KeplerGlModal",
      "state": {
       "_dom_classes": [],
       "_model_module": "keplergl-jupyter",
       "_model_module_version": "^0.3.2",
       "_model_name": "KeplerGlModal",
       "_view_count": null,
       "_view_module": "keplergl-jupyter",
       "_view_module_version": "^0.3.2",
       "_view_name": "KeplerGlView",
...
      "layers": [
...

which basically stores all of the kepler data again.

Motivation

It would be great if I could tell papermill to not store all of the metadata to prevent storing unnecessary data - in some cases GBs.

MSeal commented 3 weeks ago

Some sort of strip metadata option would make sense so it could be controlled. Happy to review a PR that adds this though I'm not available for a while to write it myself.