open-mmlab / mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark
https://mmpretrain.readthedocs.io/en/latest/
Apache License 2.0
3.5k stars 1.08k forks source link

[Bug] module 'mmcv' has no attribute 'mkdir_or_exist' #1141

Closed marouaneamz closed 2 years ago

marouaneamz commented 2 years ago

Branch

1.x branch (1.0.0rc2 or other 1.x version)

Describe the bug

i think you forget to modify mmcv to mmengine here https://github.com/open-mmlab/mmclassification/blob/1.x/tools/deployment/mmcls2torchserve.py#L6

Environment

{'sys.platform': 'linux',
 'Python': '3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0]',
 'CUDA available': True,
 'numpy_random_seed': 2147483648,
 'GPU 0': 'NVIDIA GeForce RTX 2060 with Max-Q Design',
 'CUDA_HOME': '/usr/local/cuda',
 'NVCC': 'Cuda compilation tools, release 11.0, V11.0.194',
 'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0',
 'PyTorch': '1.12.1+cu102',
 'PyTorch compiling details': 'PyTorch built with:\n'
                              '  - GCC 7.3\n'
                              '  - C++ Version: 201402\n'
                              '  - Intel(R) oneAPI Math Kernel Library Version '
                              '2021.4-Product Build 20210904 for Intel(R) 64 '
                              'architecture applications\n'
                              '  - Intel(R) MKL-DNN v2.6.0 (Git Hash '
                              '52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n'
                              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
                              '  - LAPACK is enabled (usually provided by '
                              'MKL)\n'
                              '  - NNPACK is enabled\n'
                              '  - CPU capability usage: AVX2\n'
                              '  - CUDA Runtime 10.2\n'
                              '  - NVCC architecture flags: '
                              '-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n'
                              '  - CuDNN 7.6.5\n'
                              '  - Magma 2.5.2\n'
                              '  - Build settings: BLAS_INFO=mkl, '
                              'BUILD_TYPE=Release, CUDA_VERSION=10.2, '
                              'CUDNN_VERSION=7.6.5, '
                              'CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, '
                              'CXX_FLAGS= -fabi-version=11 -Wno-deprecated '
                              '-fvisibility-inlines-hidden -DUSE_PTHREADPOOL '
                              '-fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM '
                              '-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
                              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
                              '-DEDGE_PROFILER_USE_KINETO -O2 -fPIC '
                              '-Wno-narrowing -Wall -Wextra '
                              '-Werror=return-type '
                              '-Wno-missing-field-initializers '
                              '-Wno-type-limits -Wno-array-bounds '
                              '-Wno-unknown-pragmas -Wno-unused-parameter '
                              '-Wno-unused-function -Wno-unused-result '
                              '-Wno-unused-local-typedefs -Wno-strict-overflow '
                              '-Wno-strict-aliasing '
                              '-Wno-error=deprecated-declarations '
                              '-Wno-stringop-overflow -Wno-psabi '
                              '-Wno-error=pedantic -Wno-error=redundant-decls '
                              '-Wno-error=old-style-cast '
                              '-fdiagnostics-color=always -faligned-new '
                              '-Wno-unused-but-set-variable '
                              '-Wno-maybe-uninitialized -fno-math-errno '
                              '-fno-trapping-math -Werror=format '
                              '-Wno-stringop-overflow, LAPACK_INFO=mkl, '
                              'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
                              'PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, '
                              'USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, '
                              'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, '
                              'USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, '
                              'USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n',
 'TorchVision': '0.13.1+cu102',
 'OpenCV': '4.6.0',
 'MMEngine': '0.2.0',
 'MMClassification': '1.0.0rc2+b855bc0'}

Other information

No response

Ezra-Yu commented 2 years ago

Yes. Thank you for your report. Can you fix that and create a PR?

marouaneamz commented 2 years ago

yes of course

marouaneamz commented 2 years ago

@Ezra-Yu here is the PR : https://github.com/open-mmlab/mmclassification/pull/1143

Ezra-Yu commented 2 years ago

Good Job! I will test it.

marouaneamz commented 2 years ago

@Ezra-Yu did you test the inference after deployment? https://github.com/open-mmlab/mmclassification/pull/1143#issuecomment-1292230518

Ezra-Yu commented 2 years ago

@marouaneamz Sorry for later reply. (Since I have to download&install the docker, debug it when I test that)

Yes, I have tested it after deployment refer to this tutorial. And there are errors besides the one you mentioned here.

@Ezra-Yu In my understanding, the default _scope for the registry will be initialized in the runner. to use inference-mmcls in deploy servers it must be run with runners or hardcoded default_scope.

You are right, there is some error when testing. The PR https://github.com/open-mmlab/mmclassification/pull/1139 and https://github.com/open-mmlab/mmclassification/pull/1118 are going to solve this problem.

If you really want to run the example, you can modify the docker/serve/Dockerfile as flowing:

ARG PYTORCH="1.8.1"
ARG CUDA="10.2"
ARG CUDNN="7"
FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel

# fetch the key refer to https://forums.developer.nvidia.com/t/18-04-cuda-docker-image-is-broken/212892/9
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub 32
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

ARG MMENGINE="0.2.0"
ARG MMCV="2.0.0rc1"
ARG MMCLS="1.0.0rc2"

ENV PYTHONUNBUFFERED TRUE

RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -y \
    ca-certificates \
    g++ \
    openjdk-11-jre-headless \
    # MMDet Requirements
    ffmpeg libsm6 libxext6 git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \
    && rm -rf /var/lib/apt/lists/*

ENV PATH="/opt/conda/bin:$PATH"
RUN export FORCE_CUDA=1

# TORCHSEVER
RUN pip install torchserve torch-model-archiver
RUn pip install nvgpu 

# MMLAB
ARG PYTORCH
ARG CUDA
RUN pip install mmengine==${MMENGINE}
RUN ["/bin/bash", "-c", "pip install mmcv==${MMCV} -f https://download.openmmlab.com/mmcv/dist/cu${CUDA//./}/torch${PYTORCH}/index.html"]
RUN pip3 install git+https://github.com/mzr1996/mmclassification.git@1x-model-pages
# this branch has solved that
# RUN pip install mmcls==${MMCLS}

RUN useradd -m model-server \
    && mkdir -p /home/model-server/tmp

COPY entrypoint.sh /usr/local/bin/entrypoint.sh

RUN chmod +x /usr/local/bin/entrypoint.sh \
    && chown -R model-server /home/model-server

COPY config.properties /home/model-server/config.properties
RUN mkdir /home/model-server/model-store && chown -R model-server /home/model-server/model-store

EXPOSE 8080 8081 8082

USER model-server
WORKDIR /home/model-server
ENV TEMP=/home/model-server/tmp
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["serve"]

In my env, it works fine as : image

Ezra-Yu commented 2 years ago

We will fix it in branch 1.x in next version.

Ezra-Yu commented 2 years ago

fix it in #1143