Closed FSchmidt-FUNKE closed 11 months ago
I don't know if it's easy or even possible to make a CPU only option available in setup.py
Truthfully, I don't know anything about using Docker. Is it possible to make the requirements file ask for the CPU version of torch? If you go here, you can see such a thing exists:
https://pytorch.org/get-started/locally/
something like
pip3 install torch --index-url https://download.pytorch.org/whl/cpu
Frankly, I don't know anything about how to make this work. I can ask around my lab to see if someone else does.
Also, from what I've heard, the CPU version of models that use the charlm can be much slower, much more so than on GPU. We're working on getting that straightened out, probably by providing non-charlm versions of the models.
On Mon, Aug 14, 2023 at 6:53 AM FSchmidt-FUNKE @.***> wrote:
Hi everyone,
I want to build a service that uses stanza, but our Cloud Foundry provider only allows a maximum Docker container size of 4GB. Because torch is a huge package, which installs many cuda libaries etc. the container with the german model is over 6GB.
syntax=docker/dockerfile:1
FROM python:3.11-slim-bullseye
RUN apt-get update && \ apt-get upgrade -y && \ apt install sudo -y
RUN useradd -ms /bin/bash worker USER worker WORKDIR /home/worker ENV PATH="/home/worker/.local/bin:${PATH}"
RUN python -m pip install --no-cache-dir --user --disable-pip-version-check --upgrade pip COPY requirements.txt requirements.txt RUN pip install --no-cache-dir --user -r requirements.txt
COPY download_model.py download_model.py RUN python3 download_model.py
COPY app.py app.py COPY app_logic.py app_logic.py COPY utils.py utils.py
EXPOSE 8080 CMD ["gunicorn", "--bind", ":8080", "--workers", "1", "--worker-class", "uvicorn.workers.UvicornWorker", "--threads", "10", "--timeout", "3600", "app:app"]
I only need a CPU version and I only want to do inference. Stanza provides better results for my use case than spaCy, but currently I'm forced to use spaCy, because the container is only 1.5GB with the de_core_news_lg model. Is there a smaller CPU inference version or a DockerFile that builds a smaller image for inference on CPU?
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWMSYS7Z2KF6HQZL4LTXVIUUXANCNFSM6AAAAAA3PXJ45Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This is from one of my labmates:
If avoidable at all, I would not try to install PyTorch from scratch myself in Docker for deployment (training is another matter). They could try starting out with a very small container that already has the version of PyTorch they want, like bitnami/pytorch:2.0.1 (see https://hub.docker.com/r/bitnami/pytorch/tags).
If that doesn't work then they could try changing the optimized recipe to suit their needs by looking at the original Dockerfile: https://github.com/bitnami/containers/blob/main/bitnami/pytorch/2/debian-11/Dockerfile
A good reference for tips and tricks to reduce your container size can be found here:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Given the time frame between responses, I think I'll just close this - if there's some support we can give for making docker work, please let us know, but also it's not at all a priority given the number of things which are priorities
Hi,
I also stumbled upon this issue. My project uses https://python-poetry.org/ as a package manager and I found solution that worked for me here: https://github.com/python-poetry/poetry/issues/7685#issuecomment-1632693935
poetry add --source pytorch-cpu torch
poetry source add --priority=explicit pytorch-cpu https://download.pytorch.org/whl/cpu
Maybe this helps someone else as well :)
Hi everyone,
I want to build a service that uses stanza, but our Cloud Foundry provider only allows a maximum Docker container size of 4GB. Because torch is a huge package, which installs many cuda libaries etc. the container with the german model is over 6GB.
I only need a CPU version and I only want to do inference. Stanza provides better results for my use case than spaCy, but currently I'm forced to use spaCy, because the container is only 1.5GB with the de_core_news_lg model. Is there a smaller CPU inference version or a DockerFile that builds a smaller image for inference on CPU?