ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.11k stars 5.6k forks source link

[docker] Don't depend on miniconda for the Ray base images? #42077

Open richardliaw opened 9 months ago

richardliaw commented 9 months ago

Description

The Ray base image using anaconda/miniconda -- we should consider removing these dependencies.

https://github.com/ray-project/ray/blob/master/docker/base-deps/Dockerfile#L49C1-L51C31

The anaconda policy is:

(b) You may not use Free Offerings for commercial purposes, including but not limited to external business use, third-party access, Content mirroring, or use in organizations over two hundred (200) employees (unless its use for an Educational Purpose) (each, a “Commercial Purpose”). Using the Free Offerings for a Commercial Purpose requires a Paid Plan with Anaconda.

One option might be to switch the base image to use conda-forge which doesn't have these restrictions (see https://github.com/conda-forge/miniforge).

Use case

No response

davidxia commented 1 month ago

I see latest ray-ml image has a conda that's configured to pull from repo.anaconda.com.

docker run -it --entrypoint bash rayproject/ray-ml:2.34.0.fc8721-gpu -c 'conda info'

     active environment : None
       user config file : /home/ray/.condarc
 populated config files :
          conda version : 24.7.1
    conda-build version : not installed
         python version : 3.9.19.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=broadwell
                          __conda=24.7.1=0
                          __glibc=2.35=0
                          __linux=5.15.0=0
                          __unix=0=0
       base environment : /home/ray/anaconda3  (writable)
      conda av data dir : /home/ray/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/ray/anaconda3/pkgs
                          /home/ray/.conda/pkgs
       envs directories : /home/ray/anaconda3/envs
                          /home/ray/.conda/envs
               platform : linux-64
             user-agent : conda/24.7.1 requests/2.31.0 CPython/3.9.19 Linux/5.15.0-1062-gcp ubuntu/22.04.4 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4 c/. s/. e/.
                UID:GID : 1000:100
             netrc file : None
           offline mode : False
davidxia commented 1 month ago

Is an alternative to configure conda to not use the default Anaconda channels and use conda-forge channel instead?

$ docker run -it --entrypoint bash rayproject/ray-ml:2.34.0.fc8721-gpu

ray@1ed5b90a8fd0:conda config --remove channels defaults
ray@1ed5b90a8fd0:conda config --add channels conda-forge

ray@1ed5b90a8fd0:conda info

     active environment : base
    active env location : /home/ray/anaconda3/envs/base
            shell level : 2
       user config file : /home/ray/.condarc
 populated config files : /home/ray/.condarc
          conda version : 24.7.1
    conda-build version : not installed
         python version : 3.9.19.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=broadwell
                          __conda=24.7.1=0
                          __glibc=2.35=0
                          __linux=5.15.0=0
                          __unix=0=0
       base environment : /home/ray/anaconda3  (writable)
      conda av data dir : /home/ray/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/ray/anaconda3/pkgs
                          /home/ray/.conda/pkgs
       envs directories : /home/ray/anaconda3/envs
                          /home/ray/.conda/envs
               platform : linux-64
             user-agent : conda/24.7.1 requests/2.31.0 CPython/3.9.19 Linux/5.15.0-1062-gcp ubuntu/22.04.4 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 aau/0.4.4 c/. s/. e/.
                UID:GID : 1000:100
             netrc file : None
           offline mode : False

Are channel URLs like https://conda.anaconda.org/conda-forge still free to use without license? They seem to be the same as the default channels for a miniforge-installed conda.

(base) root@4fb84a90aee8:/# conda info

     active environment : base
    active env location : /root/miniforge3
            shell level : 1
       user config file : /root/.condarc
 populated config files : /root/miniforge3/.condarc
          conda version : 24.3.0
    conda-build version : not installed
         python version : 3.10.14.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=broadwell
                          __conda=24.3.0=0
                          __glibc=2.39=0
                          __linux=5.15.0=0
                          __unix=0=0
       base environment : /root/miniforge3  (writable)
      conda av data dir : /root/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /root/miniforge3/pkgs
                          /root/.conda/pkgs
       envs directories : /root/miniforge3/envs
                          /root/.conda/envs
               platform : linux-64
             user-agent : conda/24.3.0 requests/2.31.0 CPython/3.10.14 Linux/5.15.0-1062-gcp ubuntu/24.04 glibc/2.39 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8
                UID:GID : 0:0
             netrc file : None
           offline mode : False
richardliaw commented 1 month ago

@anyscalesam - have we seen this come up in other places as well?

davidxia commented 1 month ago

Are channel URLs like https://conda.anaconda.org/conda-forge still free to use without license?

Answering my own question, I think they are.

rclough commented 1 month ago

It's worth mentioning that we're interested in a completely conda-free image if possible, but conda-forge-only would be a good first step

anyscalesam commented 2 weeks ago

@davidxia @rclough > we discussed this internally last week and believe the best step is to move from conda to condaforge. To do this we need to upgrade and build an image and run it through our complete CI + release tests to ensure stability... we're going to bring this up in planning for next month to do this work right after Ray Summit (so Oct).