mlcommons / training

Reference implementations of MLPerf™ training benchmarks
https://mlcommons.org/en/groups/training
Apache License 2.0
1.57k stars 548 forks source link

failed to build object_detection container with below error on FedoraOS37 #690

Open gaowayne opened 7 months ago

gaowayne commented 7 months ago

could you please shed some light why I suffer this error?

[stg@oq1 object_detection]$ sudo nvidia-docker build . -t mlperf/object_detection
[+] Building 34.5s (13/13) FINISHED                                                                                                                                                                   docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                            0.0s
 => => transferring dockerfile: 2.67kB                                                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                                                               0.0s
 => => transferring context: 2B                                                                                                                                                                                 0.0s
 => [internal] load metadata for docker.io/pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel                                                                                                                         0.9s
 => [internal] load build context                                                                                                                                                                              23.6s
 => => transferring context: 2.17GB                                                                                                                                                                            23.6s
 => [1/9] FROM docker.io/pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel@sha256:913e6689c5958b187e65561e528ec6c3ce8a02deedcdd38cb50c9cab301907bb                                                                   0.0s
 => CACHED [2/9] RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections                                                                                                             0.0s
 => CACHED [3/9] RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub                                                                           0.0s
 => CACHED [4/9] RUN apt-get update -y  && apt-get install -y apt-utils                        libglib2.0-0=2.56.1-2ubuntu1                        libsm6=2:1.2.2-1                        libxext6=2:1.3.3-1   0.0s
 => CACHED [5/9] RUN pip install ninja==1.8.2.post2                 yacs==0.1.5                 cython==0.29.5                 matplotlib==3.0.2                 opencv-python==4.0.0.21                 mlper  0.0s
 => CACHED [6/9] RUN pip install --no-cache-dir https://github.com/mlperf/logging/archive/9ea0afa.zip                                                                                                           0.0s
 => CACHED [7/9] WORKDIR /workspace/object_detection                                                                                                                                                            0.0s
 => [8/9] COPY . .                                                                                                                                                                                              8.6s
 => ERROR [9/9] RUN cd pytorch  && rm -rf build/  && python setup.py clean build develop --user                                                                                                                 1.4s
------                                                                                                                                                                                                               
 > [9/9] RUN cd pytorch  && rm -rf build/  && python setup.py clean build develop --user:                                                                                                                            
1.103 No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'                                                                                                                                                    
1.103 running clean                                                                                                                                                                                                  
1.103 running build                                                                                                                                                                                                  
1.103 running build_py                                                                                                                                                                                               
1.104 creating build
1.104 creating build/lib.linux-x86_64-3.7
1.104 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark
1.104 copying maskrcnn_benchmark/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark
1.104 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
1.104 copying maskrcnn_benchmark/config/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
1.104 copying maskrcnn_benchmark/config/defaults.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
1.104 copying maskrcnn_benchmark/config/paths_catalog.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/config
1.105 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
1.105 copying maskrcnn_benchmark/data/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
1.105 copying maskrcnn_benchmark/data/build.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
1.105 copying maskrcnn_benchmark/data/collate_batch.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data
1.105 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
1.105 copying maskrcnn_benchmark/engine/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
1.105 copying maskrcnn_benchmark/engine/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
1.105 copying maskrcnn_benchmark/engine/tester.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
1.105 copying maskrcnn_benchmark/engine/trainer.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/engine
1.106 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/_utils.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/batch_norm.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/misc.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/nms.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/roi_align.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.106 copying maskrcnn_benchmark/layers/roi_pool.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.107 copying maskrcnn_benchmark/layers/sigmoid_focal_loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.107 copying maskrcnn_benchmark/layers/smooth_l1_loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/layers
1.107 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.107 copying maskrcnn_benchmark/modeling/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.107 copying maskrcnn_benchmark/modeling/balanced_positive_negative_sampler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.107 copying maskrcnn_benchmark/modeling/box_coder.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.107 copying maskrcnn_benchmark/modeling/make_layers.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.107 copying maskrcnn_benchmark/modeling/matcher.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.108 copying maskrcnn_benchmark/modeling/poolers.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.108 copying maskrcnn_benchmark/modeling/registry.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.108 copying maskrcnn_benchmark/modeling/utils.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling
1.108 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
1.108 copying maskrcnn_benchmark/solver/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
1.108 copying maskrcnn_benchmark/solver/build.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
1.108 copying maskrcnn_benchmark/solver/lr_scheduler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/solver
1.109 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 copying maskrcnn_benchmark/structures/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 copying maskrcnn_benchmark/structures/bounding_box.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 copying maskrcnn_benchmark/structures/boxlist_ops.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 copying maskrcnn_benchmark/structures/image_list.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 copying maskrcnn_benchmark/structures/keypoint.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 copying maskrcnn_benchmark/structures/segmentation_mask.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/structures
1.109 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/c2_model_loading.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/checkpoint.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/collect_env.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/comm.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/cv2_util.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/env.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/imports.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/logger.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.110 copying maskrcnn_benchmark/utils/metric_logger.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.111 copying maskrcnn_benchmark/utils/miscellaneous.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.111 copying maskrcnn_benchmark/utils/mlperf_logger.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.111 copying maskrcnn_benchmark/utils/model_serialization.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.111 copying maskrcnn_benchmark/utils/model_zoo.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.111 copying maskrcnn_benchmark/utils/registry.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/utils
1.111 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
1.111 copying maskrcnn_benchmark/data/datasets/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
1.111 copying maskrcnn_benchmark/data/datasets/coco.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
1.112 copying maskrcnn_benchmark/data/datasets/concat_dataset.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
1.112 copying maskrcnn_benchmark/data/datasets/list_dataset.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
1.112 copying maskrcnn_benchmark/data/datasets/voc.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets
1.112 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
1.112 copying maskrcnn_benchmark/data/samplers/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
1.112 copying maskrcnn_benchmark/data/samplers/distributed.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
1.112 copying maskrcnn_benchmark/data/samplers/grouped_batch_sampler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
1.112 copying maskrcnn_benchmark/data/samplers/iteration_based_batch_sampler.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/samplers
1.113 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
1.113 copying maskrcnn_benchmark/data/transforms/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
1.113 copying maskrcnn_benchmark/data/transforms/build.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
1.113 copying maskrcnn_benchmark/data/transforms/transforms.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/transforms
1.113 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation
1.113 copying maskrcnn_benchmark/data/datasets/evaluation/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation
1.113 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/coco
1.113 copying maskrcnn_benchmark/data/datasets/evaluation/coco/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/coco
1.113 copying maskrcnn_benchmark/data/datasets/evaluation/coco/coco_eval.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/coco
1.114 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/voc
1.114 copying maskrcnn_benchmark/data/datasets/evaluation/voc/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/voc
1.114 copying maskrcnn_benchmark/data/datasets/evaluation/voc/voc_eval.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/data/datasets/evaluation/voc
1.114 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
1.114 copying maskrcnn_benchmark/modeling/backbone/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
1.114 copying maskrcnn_benchmark/modeling/backbone/backbone.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
1.114 copying maskrcnn_benchmark/modeling/backbone/fpn.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
1.114 copying maskrcnn_benchmark/modeling/backbone/resnet.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/backbone
1.115 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
1.115 copying maskrcnn_benchmark/modeling/detector/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
1.115 copying maskrcnn_benchmark/modeling/detector/detectors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
1.115 copying maskrcnn_benchmark/modeling/detector/generalized_rcnn.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/detector
1.115 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads
1.115 copying maskrcnn_benchmark/modeling/roi_heads/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads
1.115 copying maskrcnn_benchmark/modeling/roi_heads/roi_heads.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads
1.115 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.115 copying maskrcnn_benchmark/modeling/rpn/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.116 copying maskrcnn_benchmark/modeling/rpn/anchor_generator.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.116 copying maskrcnn_benchmark/modeling/rpn/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.116 copying maskrcnn_benchmark/modeling/rpn/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.116 copying maskrcnn_benchmark/modeling/rpn/rpn.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.116 copying maskrcnn_benchmark/modeling/rpn/utils.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn
1.116 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.116 copying maskrcnn_benchmark/modeling/roi_heads/box_head/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.116 copying maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/box_head/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_feature_extractors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_predictors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/box_head
1.117 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/keypoint_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/keypoint_head/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/keypoint_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/keypoint_head/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/keypoint_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/keypoint_head/keypoint_head.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/keypoint_head
1.117 copying maskrcnn_benchmark/modeling/roi_heads/keypoint_head/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/keypoint_head
1.118 copying maskrcnn_benchmark/modeling/roi_heads/keypoin
1.118 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.118 copying maskrcnn_benchmark/modeling/roi_heads/mask_head/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.118 copying maskrcnn_benchmark/modeling/roi_heads/mask_head/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.118 copying maskrcnn_benchmark/modeling/roi_heads/mask_head/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.118 copying maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.118 copying maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.119 copying maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_predictors.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/roi_heads/mask_head
1.119 creating build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn/retinanet
1.119 copying maskrcnn_benchmark/modeling/rpn/retinanet/__init__.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn/retinanet
1.119 copying maskrcnn_benchmark/modeling/rpn/retinanet/inference.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn/retinanet
1.119 copying maskrcnn_benchmark/modeling/rpn/retinanet/loss.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn/retinanet
1.119 copying maskrcnn_benchmark/modeling/rpn/retinanet/retinanet.py -> build/lib.linux-x86_64-3.7/maskrcnn_benchmark/modeling/rpn/retinanet
1.121 running build_ext
1.178 building 'maskrcnn_benchmark._C' extension
1.178 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7
1.178 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace
1.179 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace/object_detection
1.179 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace/object_detection/pytorch
1.179 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace/object_detection/pytorch/maskrcnn_benchmark
1.179 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace/object_detection/pytorch/maskrcnn_benchmark/csrc
1.179 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace/object_detection/pytorch/maskrcnn_benchmark/csrc/cpu
1.179 creating /workspace/object_detection/pytorch/build/temp.linux-x86_64-3.7/workspace/object_detection/pytorch/maskrcnn_benchmark/csrc/cuda
1.181 Traceback (most recent call last):
1.181   File "setup.py", line 81, in <module>
1.181     cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
1.181   File "/opt/conda/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
1.181     return distutils.core.setup(**attrs)
1.181   File "/opt/conda/lib/python3.7/distutils/core.py", line 148, in setup
1.181     dist.run_commands()
1.181   File "/opt/conda/lib/python3.7/distutils/dist.py", line 966, in run_commands
1.181     self.run_command(cmd)
1.181   File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
1.181     cmd_obj.run()
1.181   File "/opt/conda/lib/python3.7/distutils/command/build.py", line 135, in run
1.181     self.run_command(cmd_name)
1.181   File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
1.181     self.distribution.run_command(command)
1.181   File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
1.181     cmd_obj.run()
1.181   File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
1.181     _build_ext.run(self)
1.181   File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
1.181     _build_ext.build_ext.run(self)
1.181   File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
1.181     self.build_extensions()
1.181   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
1.181     build_ext.build_extensions(self)
1.181   File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
1.181     self.build_extension(ext)
1.181   File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
1.181     _build_ext.build_extension(self, ext)
1.181   File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
1.181     depends=ext.depends)
1.181   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 551, in unix_wrap_ninja_compile
1.181     cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
1.181   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 450, in unix_cuda_flags
1.181     cflags + _get_cuda_arch_flags(cflags))
1.181   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
1.181     arch_list[-1] += '+PTX'
1.181 IndexError: list index out of range
------
Dockerfile:27
--------------------
  26 |     COPY . .
  27 | >>> RUN cd pytorch \
  28 | >>>  && rm -rf build/ \
  29 | >>>  && python setup.py clean build develop --user
  30 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c cd pytorch  && rm -rf build/  && python setup.py clean build develop --user" did not complete successfully: exit code: 1
[stg@oq1 object_detection]$ 
gaowayne commented 6 months ago

I run this in ubuntu22.04, the same error below

1.392     self.run_command(cmd)
1.392   File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
1.392     cmd_obj.run()
1.392   File "/opt/conda/lib/python3.7/distutils/command/build.py", line 135, in run
1.392     self.run_command(cmd_name)
1.392   File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
1.392     self.distribution.run_command(command)
1.392   File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
1.392     cmd_obj.run()
1.392   File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
1.392     _build_ext.run(self)
1.392   File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
1.392     _build_ext.build_ext.run(self)
1.392   File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
1.392     self.build_extensions()
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
1.392     build_ext.build_extensions(self)
1.392   File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
1.392     self.build_extension(ext)
1.392   File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
1.392     _build_ext.build_extension(self, ext)
1.392   File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
1.392     depends=ext.depends)
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 551, in unix_wrap_ninja_compile
1.392     cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 450, in unix_cuda_flags
1.392     cflags + _get_cuda_arch_flags(cflags))
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
1.392     arch_list[-1] += '+PTX'
1.392 IndexError: list index out of range
------
Dockerfile:27
--------------------
  26 |     COPY . .
  27 | >>> RUN cd pytorch \
  28 | >>>  && rm -rf build/ \
  29 | >>>  && python setup.py clean build develop --user
  30 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c cd pytorch  && rm -rf build/  && python setup.py clean build develop --user" did not complete successfully: exit code: 1
dcg@oq1:/mnt/nvme1n1/mlperf/ubuntu/training/object_detection$ 
J-StrawHat commented 4 months ago

I also encountered this problem and solved it by referring to issues#619

gorleramyasri commented 4 months ago

I run this in ubuntu22.04, the same error below

1.392     self.run_command(cmd)
1.392   File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
1.392     cmd_obj.run()
1.392   File "/opt/conda/lib/python3.7/distutils/command/build.py", line 135, in run
1.392     self.run_command(cmd_name)
1.392   File "/opt/conda/lib/python3.7/distutils/cmd.py", line 313, in run_command
1.392     self.distribution.run_command(command)
1.392   File "/opt/conda/lib/python3.7/distutils/dist.py", line 985, in run_command
1.392     cmd_obj.run()
1.392   File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
1.392     _build_ext.run(self)
1.392   File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
1.392     _build_ext.build_ext.run(self)
1.392   File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 340, in run
1.392     self.build_extensions()
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 735, in build_extensions
1.392     build_ext.build_extensions(self)
1.392   File "/opt/conda/lib/python3.7/site-packages/Cython/Distutils/old_build_ext.py", line 194, in build_extensions
1.392     self.build_extension(ext)
1.392   File "/opt/conda/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
1.392     _build_ext.build_extension(self, ext)
1.392   File "/opt/conda/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
1.392     depends=ext.depends)
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 551, in unix_wrap_ninja_compile
1.392     cuda_post_cflags = unix_cuda_flags(cuda_post_cflags)
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 450, in unix_cuda_flags
1.392     cflags + _get_cuda_arch_flags(cflags))
1.392   File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
1.392     arch_list[-1] += '+PTX'
1.392 IndexError: list index out of range
------
Dockerfile:27
--------------------
  26 |     COPY . .
  27 | >>> RUN cd pytorch \
  28 | >>>  && rm -rf build/ \
  29 | >>>  && python setup.py clean build develop --user
  30 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c cd pytorch  && rm -rf build/  && python setup.py clean build develop --user" did not complete successfully: exit code: 1
dcg@oq1:/mnt/nvme1n1/mlperf/ubuntu/training/object_detection$ 

I am also facing same error. Couldn't able to solve it.