stark-t / PAI

Pollination_Artificial_Intelligence
3 stars 1 forks source link

ScaledYOLOv4 - ModuleNotFoundError: No module named 'mish_cuda' #36

Closed valentinitnelav closed 2 years ago

valentinitnelav commented 2 years ago

I have an issue with setting the proper environment for ScaledYOLOv4.

I followed the suggestion to install the dependencies listed for YOLOR as pointed out here.

Managed to do that, but then when I run the first train job, the cluster gave the error message ModuleNotFoundError: No module named 'mish_cuda'.

I tried to look at similar issues and the suggestions that came up are about installing one of these two versions of mish_cuda:

I tried that in the environment set for ScaledYOLOv4:

module purge
module load Python/3.8.6-GCCcore-10.2.0
source ~/venv/ScaledYOLOv4/bin/activate

pip install git+https://github.com/thomasbrandon/mish-cuda/
# or
pip install git+https://github.com/JunnYu/mish-cuda

deactivate

I will ask for support from the cluster team.

Both attempts from above gave long error messages - see below:

(ScaledYOLOv4) [@login01 ~]$ pip install git+https://github.com/thomasbrandon/mish-cuda/

Collecting git+https://github.com/thomasbrandon/mish-cuda/
  Cloning https://github.com/thomasbrandon/mish-cuda/ to /tmp/pip-req-build-8zuc3v94
  Running command git clone --quiet https://github.com/thomasbrandon/mish-cuda/ /tmp/pip-req-build-8zuc3v94
  Resolved https://github.com/thomasbrandon/mish-cuda/ to commit c54271c725d57af62968e960598ffedd4896ef94
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch>=1.2 in ./venv/ScaledYOLOv4/lib/python3.8/site-packages (from mish-cuda==0.0.3) (1.7.0)
Requirement already satisfied: numpy in ./venv/ScaledYOLOv4/lib/python3.8/site-packages (from torch>=1.2->mish-cuda==0.0.3) (1.23.1)
Requirement already satisfied: typing-extensions in ./venv/ScaledYOLOv4/lib/python3.8/site-packages (from torch>=1.2->mish-cuda==0.0.3) (4.3.0)
Requirement already satisfied: dataclasses in ./venv/ScaledYOLOv4/lib/python3.8/site-packages (from torch>=1.2->mish-cuda==0.0.3) (0.6)
Requirement already satisfied: future in ./venv/ScaledYOLOv4/lib/python3.8/site-packages (from torch>=1.2->mish-cuda==0.0.3) (0.18.2)
Building wheels for collected packages: mish-cuda
  Building wheel for mish-cuda (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [123 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/mish_cuda
      copying src/mish_cuda/__init__.py -> build/lib.linux-x86_64-3.8/mish_cuda
      running egg_info
      creating src/mish_cuda.egg-info
      writing src/mish_cuda.egg-info/PKG-INFO
      writing dependency_links to src/mish_cuda.egg-info/dependency_links.txt
      writing requirements to src/mish_cuda.egg-info/requires.txt
      writing top-level names to src/mish_cuda.egg-info/top_level.txt
      writing manifest file 'src/mish_cuda.egg-info/SOURCES.txt'
      reading manifest file 'src/mish_cuda.egg-info/SOURCES.txt'
      writing manifest file 'src/mish_cuda.egg-info/SOURCES.txt'
      running build_ext
      building 'mish_cuda._C' extension
      creating /tmp/pip-req-build-8zuc3v94/build/temp.linux-x86_64-3.8
      creating /tmp/pip-req-build-8zuc3v94/build/temp.linux-x86_64-3.8/csrc
      /venv/ScaledYOLOv4/lib/python3.8/site-packages/torch/utils/cpp_extension.py:253: UserWarning:

                                     !! WARNING !!

      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      Your compiler (c++) is not compatible with the compiler Pytorch was
      built with for this platform, which is g++ on linux. Please
      use g++ to to compile your extension. Alternatively, you may
      compile PyTorch from source using c++, and then you can also use
      c++ to compile your extension.

      See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
      with compiling PyTorch from source.
      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                                    !! WARNING !!

        warnings.warn(WRONG_COMPILER_WARNING.format(
      Emitting ninja build file /tmp/pip-req-build-8zuc3v94/build/temp.linux-x86_64-3.8/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
      [1/3] /usr/local/cuda/bin/nvcc -Iexternal 
.
.
.
         84 | #pragma omp parallel for if ((end - begin) >= grain_size)
            |
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/venv/ScaledYOLOv4/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1516, in _run_ninja_build
          subprocess.run(
        File "/software/all/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/subprocess.py", line 512, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-8zuc3v94/setup.py", line 10, in <module>
.
.
.  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for mish-cuda
  Running setup.py clean for mish-cuda
Failed to build mish-cuda
Installing collected packages: mish-cuda
  Running setup.py install for mish-cuda ... error
  error: subprocess-exited-with-error

  × Running setup.py install for mish-cuda did not run successfully.
  │ exit code: 1
  ╰─> [123 lines of output]
      running install
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/mish_cuda
      copying src/mish_cuda/__init__.py -> build/lib.linux-x86_64-3.8/mish_cuda
      running egg_info
      writing src/mish_cuda.egg-info/PKG-INFO
      writing dependency_links to src/mish_cuda.egg-info/dependency_links.txt
      writing requirements to src/mish_cuda.egg-info/requires.txt
      writing top-level names to src/mish_cuda.egg-info/top_level.txt
      reading manifest file 'src/mish_cuda.egg-info/SOURCES.txt'
      writing manifest file 'src/mish_cuda.egg-info/SOURCES.txt'
      running build_ext
      building 'mish_cuda._C' extension
      creating /tmp/pip-req-build-8zuc3v94/build/temp.linux-x86_64-3.8
      creating /tmp/pip-req-build-8zuc3v94/build/temp.linux-x86_64-3.8/csrc
      /venv/ScaledYOLOv4/lib/python3.8/site-packages/torch/utils/cpp_extension.py:253: UserWarning:

                                     !! WARNING !!

      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      Your compiler (c++) is not compatible with the compiler Pytorch was
      built with for this platform, which is g++ on linux. Please
      use g++ to to compile your extension. Alternatively, you may
      compile PyTorch from source using c++, and then you can also use
      c++ to compile your extension.

      See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
      with compiling PyTorch from source.
      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                                    !! WARNING !!

        warnings.warn(WRONG_COMPILER_WARNING.format(
      Emitting ninja build file /tmp/pip-req-build-8zuc3v94/build/temp.linux-x86_64-3.8/build.ninja...
      Compiling objects...
      Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
.
.
.
         84 | #pragma omp parallel for if ((end - begin) >= grain_size)
            |
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/venv/ScaledYOLOv4/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1516, in _run_ninja_build
          subprocess.run(
        File "/software/all/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/subprocess.py", line 512, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-8zuc3v94/setup.py", line 10, in <module>
...
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> mish-cuda

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
valentinitnelav commented 2 years ago

Fixed with commit bccd14799ac81b54053b762805eda1582d701fa1

I had to use certain versions of the packages available in the software tree provided by Leipzig University Computing Centre.

To create a safe environment for ScaledYOLOv4, I followed these steps:

module purge
module load PyTorch/1.7.1-fosscuda-2019b-Python-3.7.4
module load TensorFlow/2.4.0-fosscuda-2019b-Python-3.7.4
module load OpenCV/4.2.0-fosscuda-2019b-Python-3.7.4
module load matplotlib/3.1.1-fosscuda-2019b-Python-3.7.4
module load torchvision/0.8.2-fosscuda-2019b-Python-3.7.4-PyTorch-1.7.1
module load tqdm

# Create environment
python -m venv ~/venv/ScaledYOLOv4
# Activate environment
source ~/venv/ScaledYOLOv4/bin/activate

# mish-cuda installation ss suggested at https://github.com/WongKinYiu/ScaledYOLOv4#installation
pip install git+https://github.com/JunnYu/mish-cuda

pip install seaborn
pip install thop
pip install pycocotools

deactivate

However, now we have other issues - not enough RAM :D


For archive notes:

When I installed seaborn, I got these warning messages, but the installation was successful:

pip install seaborn

Collecting seaborn
  Using cached https://files.pythonhosted.org/packages/10/5b/0479d7d845b5ba410ca702ffcd7f2cd95a14a4dfff1fde2637802b258b9b/seaborn-0.11.2-py3-none-any.whl
Requirement already satisfied: matplotlib>=2.2 in /software/all/matplotlib/3.1.1-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages (from seaborn) (3.1.1)
Requirement already satisfied: scipy>=1.0 in /software/all/SciPy-bundle/2019.10-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages (from seaborn) (1.3.1)
Requirement already satisfied: pandas>=0.23 in /software/all/SciPy-bundle/2019.10-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages (from seaborn) (0.25.3)
Requirement already satisfied: numpy>=1.15 in /software/all/SciPy-bundle/2019.10-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages (from seaborn) (1.17.3)
Requirement already satisfied: kiwisolver>=1.0.1 in /software/all/matplotlib/3.1.1-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.1.0)
Collecting python-dateutil>=2.1 (from matplotlib>=2.2->seaborn)
  Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl
Requirement already satisfied: cycler>=0.10 in /software/all/matplotlib/3.1.1-fosscuda-2019b-Python-3.7.4/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib>=2.2->seaborn)
  Using cached https://files.pythonhosted.org/packages/6c/10/a7d0fa5baea8fe7b50f448ab742f26f52b80bfca85ac2be9d35cdd9a3246/pyparsing-3.0.9-py3-none-any.whl
Collecting pytz>=2017.2 (from pandas>=0.23->seaborn)
  Using cached https://files.pythonhosted.org/packages/60/2e/dec1cc18c51b8df33c7c4d0a321b084cf38e1733b98f9d15018880fb4970/pytz-2022.1-py2.py3-none-any.whl
Requirement already satisfied: setuptools in ./venv/ScaledYOLOv4/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.2->seaborn) (40.8.0)
Collecting six>=1.5 (from python-dateutil>=2.1->matplotlib>=2.2->seaborn)
  Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl
tensorflow 2.4.0 requires wheel>=0.26, which is not installed.
tensorboard 2.4.0 requires requests<3,>=2.21.0, which is not installed.
tensorboard 2.4.0 requires wheel>=0.26; python_version >= "3", which is not installed.
astunparse 1.6.3 requires wheel<1.0,>=0.23.0, which is not installed.
tensorboard 2.4.0 has requirement setuptools>=41.0.0, but you'll have setuptools 40.8.0 which is incompatible.
tensorboard-plugin-profile 2.4.0 has requirement setuptools>=41.0.0, but you'll have setuptools 40.8.0 which is incompatible.
Installing collected packages: seaborn, six, python-dateutil, pyparsing, pytz
Successfully installed pyparsing-3.0.9 python-dateutil-2.8.2 pytz-2022.1 seaborn-0.11.2 six-1.16.0