open-mmlab / mmengine

OpenMMLab Foundational Library for Training Deep Learning Models
https://mmengine.readthedocs.io/
Apache License 2.0
1.17k stars 357 forks source link

[Bug] KeyError: 'ASGD is already registered in optimizer at torch.optim.asgd' (when running pytest) #1593

Open GaetanLepage opened 1 week ago

GaetanLepage commented 1 week ago

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'),
             ('Python', '3.11.10 (main, Sep  7 2024, 01:03:31) [GCC 13.3.0]'),
             ('CUDA available', False),
             ('MUSA available', False),
             ('numpy_random_seed', 2147483648),
             ('GCC', 'gcc (GCC) 13.3.0'),
             ('PyTorch', '2.5.0'),
             ('PyTorch compiling details',
              'PyTorch built with:\n'
              '  - GCC 13.3\n'
              '  - C++ Version: 201703\n'
              '  - Intel(R) MKL-DNN v3.5.3 (Git Hash N/A)\n'
              '  - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
              '  - LAPACK is enabled (usually provided by MKL)\n'
              '  - CPU capability usage: AVX2\n'
              '  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, '
              'CXX_COMPILER=/nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-gcc-wrapper-13.3.0/bin/g++, '
              'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=1 '
              '-fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG '
              '-DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER '
              '-DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
              '-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall '
              '-Wextra -Werror=return-type -Werror=non-virtual-dtor '
              '-Werror=range-loop-construct -Werror=bool-operation -Wnarrowing '
              '-Wno-missing-field-initializers -Wno-type-limits '
              '-Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter '
              '-Wno-strict-overflow -Wno-strict-aliasing '
              '-Wno-stringop-overflow -Wsuggest-override -Wno-psabi '
              '-Wno-error=old-style-cast -Wno-missing-braces '
              '-fdiagnostics-color=always -faligned-new '
              '-Wno-unused-but-set-variable -Wno-maybe-uninitialized '
              '-fno-math-errno -fno-trapping-math -Werror=format '
              '-Wno-stringop-overflow, LAPACK_INFO=open, PERF_WITH_AVX=1, '
              'PERF_WITH_AVX2=1, TORCH_VERSION=2.5.0, USE_CUDA=OFF, '
              'USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EIGEN_FOR_BLAS=ON, '
              'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
              'USE_MKL=OFF, USE_MKLDNN=1, USE_MPI=OFF, USE_NCCL=OFF, '
              'USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, '
              'USE_ROCM_KERNEL_ASSERT=OFF, \n'),
             ('OpenCV', '4.9.0'),
             ('MMEngine', '0.10.5')])

Reproduces the problem - code sample

Not applicable.

Reproduces the problem - command or script

pytest

Reproduces the problem - error message

KeyError: 'ASGD is already registered in optimizer at torch.optim.asgd'

Additional information

  1. Expected result: Tests run successfully
  2. Dataset: not applicable
  3. Supposed reason: New version of pytorch (2.5.0)

All the tests fail in the same way:

________ ERROR collecting tests/test_hooks/test_early_stopping_hook.py _________
tests/test_hooks/test_early_stopping_hook.py:13: in <module>
    from mmengine.hooks import EarlyStoppingHook
mmengine/hooks/__init__.py:4: in <module>
    from .ema_hook import EMAHook
mmengine/hooks/ema_hook.py:8: in <module>
    from mmengine.model import is_model_wrapper
mmengine/model/__init__.py:6: in <module>
    from .base_model import BaseDataPreprocessor, BaseModel, ImgDataPreprocessor
mmengine/model/base_model/__init__.py:2: in <module>
    from .base_model import BaseModel
mmengine/model/base_model/base_model.py:9: in <module>
    from mmengine.optim import OptimWrapper
mmengine/optim/__init__.py:2: in <module>
    from .optimizer import (OPTIM_WRAPPER_CONSTRUCTORS, OPTIMIZERS,
mmengine/optim/optimizer/__init__.py:5: in <module>
    from .builder import (OPTIM_WRAPPER_CONSTRUCTORS, OPTIMIZERS,
mmengine/optim/optimizer/builder.py:33: in <module>
    TORCH_OPTIMIZERS = register_torch_optimizers()
mmengine/optim/optimizer/builder.py:28: in register_torch_optimizers
    OPTIMIZERS.register_module(module=_optim)
mmengine/registry/registry.py:661: in register_module
    self._register_module(module=module, module_name=name, force=force)
mmengine/registry/registry.py:611: in _register_module
    raise KeyError(f'{name} is already registered in {self.name} '
E   KeyError: 'ASGD is already registered in optimizer at torch.optim.asgd'
tibor-reiss commented 1 week ago

I was not able to reproduce the error - so far I only see the issue with Adafactor which was added to torch in 2.5. Could you please provide some details about your environment? e.g. pip list (or conda equivalent).

tobim commented 5 days ago

This error comes out of the nixpkgs package for mmengine. I verified that this is caused by the upgrade to pytorch 2.5, and that 4c22f78cdea2981a2b48a167e9feffe4721f8901 fixes the issue.