huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.73k stars 26.94k forks source link

Docker container with development environment for Transformers library #32628

Closed manuelsh closed 3 weeks ago

manuelsh commented 3 months ago

Feature request

A docker container that one can use with or without GPU that has everything to start developing, if it doesn't exist and a reference to it in the CONTRIBUTING.MD file.

Motivation

Reduce the barrier to start developing in any system with docker.

Your contribution

I could build it. Would be something in the lines of:

FROM python:3.10

# Install system dependencies
RUN apt-get update && \
    apt-get install -y git && \
    pip install --upgrade pip --no-cache-dir && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Set a temp directory to install environment
WORKDIR /usr/src/temp

# Clone the repository
RUN git clone https://github.com/huggingface/transformers.git

# Set the working directory to the cloned repository
WORKDIR /usr/src/temp/transformers

# Install development dependencies 
RUN pip install -e .[dev]

# Go tot he working directory for development
WORKDIR /usr/src/app

plus a docker compose like:

services:
  app:
    build: .
    volumes:
      - ~:/usr/src/app
    ports:
      - "8000:8000"
    entrypoint: ["tail", "-f", "/dev/null"]
    environment:
      - RUN_SLOW=true
amyeroberts commented 3 months ago

Hi @manuelsh, thanks for opening this feature request!

We have a suite of dockerfiles defined here: https://github.com/huggingface/transformers/tree/main/docker

Would you like to add a reference in CONTRIBUTING.md of them with a guide on how to use?

manuelsh commented 3 months ago

Thanks for the answer @amyeroberts ! I saw them before, and was wondering if there is any place where they are explained (could find some of them in the documentation), maybe a README.MD in the docker folder could do it.

Also, out of all of them, which one is the one which installs the development environment? couldn't find any that runs the pip install -e ".[dev] command.

amyeroberts commented 2 months ago

Oh - good point, I don't think there's a specific dev image. One thing to consider is that pip install -e .[dev] won't install any of the ML libraries needed to develop on models e.g. torch, tensorflow or flax, so you might need framework specific images

cc @ydshieh WDYT?

ydshieh commented 2 months ago

We are mostly using

    container:
      image: huggingface/transformers-all-latest-gpu

which contains torch and tensorflow and other stuffs, but no flax/jax. (It is built daily)

I think this is quite sufficient. An image with all 3 frameworks installed sometimes just increase the chance of chaos. (I sometimes need to uninstall tensorflow before running something).

ArthurZucker commented 2 months ago

@manuelsh would you like to add a readme on the usage of these dockers? 🤗 otherwise I can draft a small readme explaning usage, TLDR it's for our CIs which separate the frameworks!

LysandreJik commented 2 months ago

@ArthurZucker or @ydshieh it might be a good idea to draft a quick README indeed as it's going to be hard to know the differences/what they're used for otherwise

manuelsh commented 3 weeks ago

Fantastic to see it happening. I didn't have the context on the dockers, now I do. Thanks @ArthurZucker