Provide a non-ML "open3d-lite" package to minimize Linux wheel size

johnthagen commented 2 years ago

Checklist

[X] I have searched for similar issues.
[X] For Python issues, I have tested with the latest development wheel.
[X] I have checked the release documentation and the latest documentation (for master branch).

Proposed new feature or change

If you look at a release that shows all supported platforms: https://pypi.org/project/open3d/0.14.1/#files

Notice the Linux x84_64 wheel is 400MB while most other wheels are between 30-70MB. If you decompress the wheel, you find that the cpu and cuda folders represent the vast majority of the space. Specifically:

cpu/pybind.cpython-39-x86_64-linux-gnu.so: 177MB
cuda/pybind.cpython-39-x86_64-linux-gnu.so: 752MB

When deploying Open3D within a Linux x86_64 Docker image, minimizing the final image size is very important and there are many applications that can use Open3D without Tensorflow or PyTorch. Allowing users to select a PyPI package/wheel without the ML dependencies built into it would make these deployments more efficient.

Some ideas:

Create an open3d-lite PyPI package where wheels are published without the ML dependencies/features (this would also match how the rest of the platforms work
OR, leave open3d as the base non-ML package, and publish an open3d-ml PyPI package for only the supported platforms that support pre-build Tensorflow/PyTorch integration.

wasdee commented 1 year ago

voting for

leave open3d as the base non-ML package, and publish an open3d-ml PyPI package for only the supported platforms that support pre-build Tensorflow/PyTorch integration.

johnthagen commented 1 year ago

As another motivating example, another problem we've run into using Open3d in container images relates to start up time.

Open3D within a Linux container (likely due to its very large DLLs?) takes a long time to import the first time on a system. Since container images are immutable, this means that this cost is paid each time the container starts (e.g. byte code is not cached), something that developers using Open3D outside of a container only experience once per time they update Open3D and then don't experience for a long time.

Having import open3d at module scope in a web backend, for example, we've noticed that the first time a user makes a web request there is a long 2 second pause while that worker imports Open3D for the first time. Since the production application runs within containers, any time the container is restarted (due the host system restarting, etc.) the next time those workers get their first requests there is a long delay. With N number of worker processes, this cost has to be paid N times.

In our case, even determining that Open3D's import was the root cause took us a long time.

There are ways to try to work around this like lazily importing Open3D, but this is just one more example why allowing users to take only the relevant pieces of Open3D they need can have many real world impacts.

To reproduce this:

FROM python:3.10-bullseye

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONBUFFERED 1
ENV PIP_NO_CACHE_DIR 1

RUN apt-get update && apt-get install -y \
    # http://www.open3d.org/docs/release/docker.html
    libgl1 \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install "open3d==0.16.0"

$ docker build --tag open3d .
$ docker run --rm --interactive --tty open3d bash

$ time python -c "import open3d"

real    0m2.677s
user    0m3.511s
sys     0m1.264s

ssheorey commented 1 year ago

Hi @johnthagen we have looked into the wheel size and startup time issues. Here are some of our observations:

The main contributors to the wheel size are the CUDA kernels (as you point out: cuda/pybind.cpython-39-x86_64-linux-gnu.so: 752MB), since they are built for multiple CUDA architectures. We periodically trim the selected CDA architectures in the build. More definite data driven advice about CUDA architectures in actual use would be helpful to reduce the wheel size here.
The ML code is Python and takes up little space (~1.5MB / 1.2GB uncompressed) in the wheel itself, so a non-ML wheel will not help. We are looking into making the ML dependencies optional. It doesn't contribute to the startup time either.
The startup time is indeed due to the large DSO size and directly depends on the number of exported symbols. Trimming unnecessary exported symbols by hiding them is an ongoing maintenance process and we would appreciate PR help here. See DSO symbol visibility for more information about this.

ssheorey commented 1 year ago

See this earlier PR for efforts to address these issues: https://github.com/isl-org/Open3D/pull/3542 Future PRs along this line would be welcome.

johnthagen commented 1 year ago

The ML code is Python and takes up little space (~1.5MB / 1.2GB uncompressed) in the wheel itself, so a non-ML wheel will not help

Thank you for the clarification. I was grouping "CUDA" and "ML" together in my head as I assumed that the CUDA dependencies were strictly used for ML functionality within Open3D. So I had been thinking that a non-ML wheel wouldn't need to have the CUDA DLLs at all.

ssheorey commented 1 year ago

Many of Open3D's geometry processing workflows are now accelerated with CUDA :-)

amarion35 commented 1 year ago

But It's useful only if your code is running on a machine with a GPU. If you don't use a GPU it's just dead weight.

johnthagen commented 1 year ago

with a GPU

And, to be more specific, an NVIDIA GPU.

ssheorey commented 1 year ago

Agreed. If you would prefer a (significantly smaller) CPU only wheel (along the lines of pytorch-cpu and tensorflow-cpu), please open a new issue. We can definitely consider providing that.

johnthagen commented 1 year ago

@ssheorey

Agreed. If you would prefer a (significantly smaller) CPU only wheel (along the lines of pytorch-cpu and tensorflow-cpu), please open a new issue. We can definitely consider providing that.

Great! This would certainly help the overall issue at hand.

I created:

5869

johnthagen commented 1 year ago

@ssheorey Concerning start-up time, I was surprised by the following results on the latest Open3D (0.17.0):

FROM python:3.10-bullseye

ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONBUFFERED 1
ENV PIP_NO_CACHE_DIR 1

RUN apt-get update && apt-get install -y \
    # http://www.open3d.org/docs/release/docker.html
    libgl1 \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install "open3d-cpu==0.17.0"

$ docker build --tag open3d .
$ docker run --rm --interactive --tty open3d bash

$ time python -c "import open3d"

real    0m2.990s
user    0m3.880s
sys 0m1.176s

I expected that using the -cpu wheels as well as with #5877 would have sped up the import time, but it doesn't seem like it's the case. This was tested on an x86_64 Macbook running Docker Desktop.

Is it expected to still take 2-3 seconds to import open3d in this case?

johnthagen commented 1 month ago

@ssheorey For documentation purposes, I retried this experiment on the latest Open3D (0.18.0) and Python 3.11 and found a similar result that importing open3d-cpu within a container takes 2-3 seconds.

FROM python:3.11-slim-bookworm

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONBUFFERED=1
ENV PIP_NO_CACHE_DIR=1

RUN apt-get update && apt-get install -y \
    # https://www.open3d.org/docs/release/docker.html
    libgl1 \
    libgomp1 \
    && rm -rf /var/lib/apt/lists/*

RUN pip install "open3d-cpu==0.18.0"

$ docker build --tag open3d .
$ docker run --rm --interactive --tty open3d bash

$ time python -c "import open3d"

real    0m2.326s
user    0m6.336s
sys     0m0.213s

Compare to printing Hello World

$ time python -c "print('Hello World')"
Hello World

real    0m0.010s
user    0m0.000s
sys     0m0.001s

isl-org / Open3D

Provide a non-ML "open3d-lite" package to minimize Linux wheel size #5124

Checklist

Proposed new feature or change

5869