pytorch / builder

Continuous builder and binary build scripts for pytorch
BSD 2-Clause "Simplified" License
328 stars 220 forks source link

Unable to build Docker image with release/2.4 branch #1922

Open ajindal1 opened 2 months ago

ajindal1 commented 2 months ago

I am unable to build the docker image with release/2.4 branch. The issue has been fixed in the main branch and I believe these two PRs are needed to fix it, https://github.com/pytorch/builder/pull/1904 and https://github.com/pytorch/builder/pull/1870. To reproduce the issue:

# Use release/2.4 branch
GPU_ARCH_TYPE=cuda GPU_ARCH_VERSION=11.8 manywheel/build_docker.sh

Error:

get         which         xz         yasm:
0.763 Loaded plugins: fastestmirror, ovl
1.044 Determining fastest mirrors
1.171 Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
1.171 14: curl#6 - "Could not resolve host: mirrorlist.centos.org; Unknown error"
1.178
1.178
1.178  One of the configured repositories failed (Unknown),
1.178  and yum doesn't have enough cached data to continue. At this point the only
1.178  safe thing yum can do is fail. There are a few ways to work "fix" this:
1.178
1.178      1. Contact the upstream for the repository and get them to fix the problem.
1.178
1.178      2. Reconfigure the baseurl/etc. for the repository, to point to a working
1.178         upstream. This is most often useful if you are using a newer
1.178         distribution release than is supported by the repository (and the
1.178         packages for the previous distribution release still work).
1.178
1.178      3. Run the command with the repository temporarily disabled
1.178             yum --disablerepo=<repoid> ...
1.178
1.178      4. Disable the repository permanently, so yum won't use it by default. Yum
1.178         will then just ignore the repository until you permanently enable it
1.178         again or use --enablerepo for temporary usage:
1.178
1.178             yum-config-manager --disable <repoid>
1.178         or
1.178             subscription-manager repos --disable=<repoid>
1.178
1.178      5. Configure the failing repository to be skipped, if it is unavailable.
1.178         Note that yum will try to contact the repo. when it runs most commands,
1.178         so will have to try and fail each time (and thus. yum will be be much
1.178         slower). If it is a very temporary problem though, this is often a nice
1.178         compromise:
1.178
1.178             yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
1.178
1.178 Cannot find a valid baseurl for repo: base/7/x86_64
------
Dockerfile:86
--------------------

cc: @atalman

atalman commented 2 months ago

hi @ajindal1 this PR should not affect the release since we are using pinned Docker images here: https://hub.docker.com/r/pytorch/manylinux-builder/tags?page=&page_size=&ordering=&name=cuda12.1-2.4