Open huonw opened 3 years ago
Thanks for the deep look at this issue! @jonmmease what do you think? I think we could probably implement a lot of this so long as the docstrings are still readable, right?
Thanks for the quick response!
Two other potential options I thought of over the weekend could be:
gzip -9 -c _stream.py | wc -c
reports 1163, i.e. less than 30% of the original), and lazily decompress and exec
them on import, somehow (might require Python 3.7)sys.path
in a way that may be fragile to support Python 3.6 (whereas Python 3.7+ might be able to be fancier and use zipimport.zipimporter.load_module
directly)Thanks for taking a look at this @huonw. I'd have no problem running the generated code through a minimizer instead of black if that's helpful. The compression approaches would carry a bit more breakage risk I think, so that take some care.
The biggest wins would probably be in detecting the use of identical objects throughout the figure hierarchy and sharing those classes.
Just a status check: this appears to have crept upwards (123MB in 5.1 -> 128MB in 5.8) with both graph_objs
. and validators
:
pip install --target=/tmp/plotly/ plotly==5.8.0
du -sch /tmp/plotly/plotly/* | sort -h
Output:
4.0K /tmp/plotly/plotly/_version.py
4.0K /tmp/plotly/plotly/_widget_version.py
4.0K /tmp/plotly/plotly/animation.py
4.0K /tmp/plotly/plotly/config.py
4.0K /tmp/plotly/plotly/conftest.py
4.0K /tmp/plotly/plotly/dashboard_objs.py
4.0K /tmp/plotly/plotly/exceptions.py
4.0K /tmp/plotly/plotly/files.py
4.0K /tmp/plotly/plotly/grid_objs.py
4.0K /tmp/plotly/plotly/missing_ipywidgets.py
4.0K /tmp/plotly/plotly/optional_imports.py
4.0K /tmp/plotly/plotly/presentation_objs.py
4.0K /tmp/plotly/plotly/serializers.py
4.0K /tmp/plotly/plotly/session.py
4.0K /tmp/plotly/plotly/validator_cache.py
4.0K /tmp/plotly/plotly/version.py
4.0K /tmp/plotly/plotly/widgets.py
8.0K /tmp/plotly/plotly/__init__.py
8.0K /tmp/plotly/plotly/callbacks.py
8.0K /tmp/plotly/plotly/colors
8.0K /tmp/plotly/plotly/utils.py
12K /tmp/plotly/plotly/shapeannotation.py
12K /tmp/plotly/plotly/subplots.py
16K /tmp/plotly/plotly/data
16K /tmp/plotly/plotly/plotly
20K /tmp/plotly/plotly/graph_objects
28K /tmp/plotly/plotly/tools.py
36K /tmp/plotly/plotly/basewidget.py
52K /tmp/plotly/plotly/_subplots.py
76K /tmp/plotly/plotly/offline
220K /tmp/plotly/plotly/basedatatypes.py
264K /tmp/plotly/plotly/matplotlylib
352K /tmp/plotly/plotly/__pycache__
368K /tmp/plotly/plotly/io
380K /tmp/plotly/plotly/express
668K /tmp/plotly/plotly/figure_factory
3.7M /tmp/plotly/plotly/package_data
45M /tmp/plotly/plotly/graph_objs
84M /tmp/plotly/plotly/validators
135M total
(As always, thank you for plotly.)
Any news on this? It's making it difficult to deploy AWS Lambda functions containing plotly, even when zipped.
Hi - we are tidying up stale issues and PRs in Plotly's public repositories so that we can focus on things that are still important to our community. Since this one has been sitting for a while, I'm going to close it; if you'd like to submit a PR, we'd be happy to prioritize a review. Thank you - @gvwilson
Just observing that this is continues to creep up: the package size is now 151MB, in the latest 5.22.0.
metric | 5.1.0 | 5.8.0 | 5.22.0 |
---|---|---|---|
graph_objs/ |
43 | 45 | 48 |
validators/ |
80 | 84 | 97 |
total package size | 123 | 131 | 151 |
(Sizes numbers in MB)
pip install --target=/tmp/plotly plotly==5.22.0
du -sch /tmp/plotly/plotly/* | sort -h
4.0K /tmp/plotly/plotly/_version.py
4.0K /tmp/plotly/plotly/_widget_version.py
4.0K /tmp/plotly/plotly/animation.py
4.0K /tmp/plotly/plotly/config.py
4.0K /tmp/plotly/plotly/conftest.py
4.0K /tmp/plotly/plotly/dashboard_objs.py
4.0K /tmp/plotly/plotly/exceptions.py
4.0K /tmp/plotly/plotly/files.py
4.0K /tmp/plotly/plotly/grid_objs.py
4.0K /tmp/plotly/plotly/missing_ipywidgets.py
4.0K /tmp/plotly/plotly/optional_imports.py
4.0K /tmp/plotly/plotly/presentation_objs.py
4.0K /tmp/plotly/plotly/serializers.py
4.0K /tmp/plotly/plotly/session.py
4.0K /tmp/plotly/plotly/validator_cache.py
4.0K /tmp/plotly/plotly/version.py
4.0K /tmp/plotly/plotly/widgets.py
8.0K /tmp/plotly/plotly/__init__.py
8.0K /tmp/plotly/plotly/callbacks.py
8.0K /tmp/plotly/plotly/colors
8.0K /tmp/plotly/plotly/utils.py
12K /tmp/plotly/plotly/shapeannotation.py
12K /tmp/plotly/plotly/subplots.py
16K /tmp/plotly/plotly/data
16K /tmp/plotly/plotly/plotly
20K /tmp/plotly/plotly/graph_objects
28K /tmp/plotly/plotly/tools.py
36K /tmp/plotly/plotly/basewidget.py
52K /tmp/plotly/plotly/_subplots.py
76K /tmp/plotly/plotly/offline
224K /tmp/plotly/plotly/basedatatypes.py
256K /tmp/plotly/plotly/matplotlylib
352K /tmp/plotly/plotly/__pycache__
364K /tmp/plotly/plotly/io
392K /tmp/plotly/plotly/express
668K /tmp/plotly/plotly/figure_factory
3.6M /tmp/plotly/plotly/package_data
48M /tmp/plotly/plotly/graph_objs
97M /tmp/plotly/plotly/validators
151M total
Just wanted to add my voice to this.
It appears, the size keeps increasing
plotly 5.24.1
179.8M /usr/local/lib/python3.12/site-packages/plotly
Just to include that while it's increasing, it's a problem in cases not just of AWS lambda, but anywhere we deploy code (container images, VMDKs, etc.)
What versions of python does plotly currently support? Maybe I can whip up a PR implementing some of the above ideas.
Thank you for plotly.py, it's definitely worked well for us in our app!
We're deploying our app's backend to AWS lambda, packaging dependencies in a "layer" which has a 256MB size limit. We are hitting this limit. Unfortunately, plotly's Python library is huge: for the version we're using there (4.14.3), it ends up being 58MB of Python source, and ~19MB of JavaScript (plotly.min.js, and then the Jupyter plugin). The python source seems to be almost entirely the auto-generated (AIUI)
graph_objs
andvalidators
subdirectories. To reduce size, we've removed the JavaScript files, because the lambdas don't use any of that, however that still leaves the significant amount of Python code.To make this more concrete, here's the numbers for the latest version on my Mac:
That is, 123MiB/129MiB (95%) of the package size is the autogenerated
graph_objs
andvalidators
submodules.Since these are autogenerated, potentially they could be autogenerated in a way that makes them significantly smaller without changing behaviour or structure. Some ideas:
These will require disabling black and generally make the files harder to read, but I don't think they're designed to be human readable anyway?
(There's also other possibilities like combining multiple files into one, allowing sharing imports, but this is probably only a small win, and will require changing other code.)
For example, starting with https://github.com/plotly/plotly.py/blob/v5.1.0/packages/python/plotly/plotly/graph_objs/bar/_stream.py one could save ~20%: https://gist.github.com/huonw/4b81b6825ebd508bbcd39f4bb2215f4e
# ----
commentsAssuming this 20% decrease generalises across all the autogenerated files, this would cut nearly 25MB off the 129M package.
(Thanks again for plotly!)