tensorchord / envd

🏕️ Reproducible development environment
https://envd.tensorchord.ai/
Apache License 2.0
1.95k stars 156 forks source link

bug: conda cache does not work #1527

Open gaocegege opened 1 year ago

gaocegege commented 1 year ago

Are you use the envd server?

Describe the bug

def build():
    config.repo(url="https://github.com/tensorchord/envd", description="gnn")
    base(os="ubuntu20.04", language="python3.7")

    install.cuda(version="11.3.1")

    install.python_packages(name = [
        "dgllife",
    ])

    install.conda_packages(
        name=[
            "pytorch",
            "cudatoolkit=11.3",
            "rdkit",
            "dgl-cuda11.3",
        ],
        channel=[
            "pytorch",
            "conda-forge",
            "dglteam",
        ],
    )
    shell("bash")

The conda cannot be cached

To Reproduce

Expected behavior

No response

The docker info output

None

The envd version output

v0.3.11

Additional context

No response

gaocegege commented 1 year ago

@cutecutecat

Could you please have a look?

Electronic-Waste commented 8 months ago

Maybe I can have a try.

Electronic-Waste commented 8 months ago

I can't download dependencies... I wonder if it's due to my OS(macOS).

#32 [internal] /opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3 pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3
#32 73.23 done
#32 73.23 Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
#32 410.1 Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
#32 1866.9 Collecting package metadata (repodata.json): ...working... WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.8.0.*, but conda is ignoring the .* and treating it as 1.8.0
#32 2078.4 WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.9.0.*, but conda is ignoring the .* and treating it as 1.9.0
#32 2078.4 WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0
#32 2157.3 done
#32 2157.3 Solving environment: ...working... DEBU[2024-01-09T13:15:48+08:00] stopping session                             

#32 ERROR: process "/opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3" did not complete successfully: exit code: 137
------
 > importing cache manifest from docker.io/tensorchord/python-cache:envd-v0.3.43-cuda-11.3.1-cudnn-8:
------
------
 > [internal] /opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3 pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3:
#32 2157.3 Solving environment: ...working... 

#0 1.682 Collecting package metadata (current_repodata.json): ...working... WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.7.1.*, but conda is ignoring the .* and treating it as 1.7.1
#32 73.23 done
failed with initial frozen solve. Retrying with flexible solve.
WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.8.0.*, but conda is ignoring the .* and treating it as 1.8.0
#32 2078.4 WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.9.0.*, but conda is ignoring the .* and treating it as 1.9.0
#32 2078.4 WARNING conda.models.version:get_matcher(537): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0
#32 2157.3 done
------
ERRO[2024-01-09T13:15:48+08:00] Buildkit error: failed to solve: process "/opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3" did not complete successfully: exit code: 137
(1) attached stack trace
  -- stack trace:
  | github.com/tensorchord/envd/pkg/builder.generalBuilder.build.func1
  |     /home/runner/work/envd/envd/pkg/builder/build.go:265
  | golang.org/x/sync/errgroup.(*Group).Go.func1
  |     /home/runner/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75
  | runtime.goexit
  |     /opt/hostedtoolcache/go/1.19.10/x64/src/runtime/asm_arm64.s:1172
Wraps: (2) Buildkit error
Wraps: (3) failed to solve: process "/opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3" did not complete successfully: exit code: 137
  | (1) failed to solve: process "/opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3" did not complete successfully: exit code: 137
  | Error types: (1) *builder.BuildkitdErr
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *builder.BuildkitdErr 
ERRO[2024-01-09T13:15:48+08:00]                                               error="failed to load docker image: Post \"http://%2Fvar%2Frun%2Fdocker.sock/v1.43/images/load?quiet=1\": context canceled" language-version=v0 tag="envd-quick-start:dev"
FATA[2024-01-09T13:15:48+08:00] exit                                          app=envd error="failed to build the image: failed to build: failed to wait error group: Buildkit error: failed to solve: process \"/opt/conda/bin/conda install -n envd -c pytorch -c conda-forge -c dglteam pytorch cudatoolkit=11.3 rdkit dgl-cuda11.3\" did not complete successfully: exit code: 137" version=v0.3.43

My docker info:

Client:
 Version:    24.0.2
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.0
    Path:     /Users/x/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.19.1
    Path:     /Users/x/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/x/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.20
    Path:     /Users/x/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.6
    Path:     /Users/x/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/x/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/x/.docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  0.16.1
    Path:     /Users/x/.docker/cli-plugins/docker-scout

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 3
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.49-linuxkit-pr
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 5
 Total Memory: 7.667GiB
 Name: docker-desktop
 ID: 0a1c4432-d01a-4090-b9da-8cf7b4464c9d
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false
kemingy commented 8 months ago

@Electronic-Waste can you provide your envd build file?

Electronic-Waste commented 8 months ago

My envd build file is the buggy file provided by @gaocegege . (In the beginning of this issue)