openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.8k stars 674 forks source link

[Bug]: A Multi-GPU Parallel Training error with API #1821

Closed pipiyaa closed 8 months ago

pipiyaa commented 8 months ago

Describe the bug

When i try to train patchcore with api codes: image

an error occured: AttributeError: 'DistributedDataParallel' object has no attribute 'learning_type'

Dataset

MVTec

Model

PatchCore

Steps to reproduce the behavior

run .py file with api codes as follow:

Import the required modules

from anomalib.data import MVTec from anomalib.models import Patchcore from anomalib.engine import Engine

Initialize the datamodule, model and engine

datamodule = MVTec(root="/home/syh/MVTecAD/mvtec_anomaly_detection") model = Patchcore() engine = Engine()

Train the model

engine.fit(datamodule=datamodule, model=model)

OS information

OS information:

Expected behavior

Traceback (most recent call last): File "/home/xx/anomalib/anomalib_pro/test.py", line 12, in engine.fit(datamodule=datamodule, model=model) File "/home/xx/anomalib/src/anomalib/engine/engine.py", line 515, in fit self.trainer.fit(model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit call._call_and_handle_interrupt( File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 43, in _call_and_handle_interrupt return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, kwargs) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/strategies/launchers/subprocess_script.py", line 102, in launch return function(*args, *kwargs) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run results = self._run_stage() File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1035, in _run_stage self.fit_loop.run() File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 202, in run self.advance() File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 359, in advance self.epoch_loop.run(self._data_fetcher) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 136, in run self.advance(data_fetcher) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 259, in advance call._call_callback_hooks(trainer, "on_train_batch_end", batch_output, batch, batch_idx) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 208, in _call_callback_hooks fn(trainer, trainer.lightning_module, args, kwargs) File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py", line 285, in on_train_batch_end if self._should_skip_saving_checkpoint(trainer): File "/home/syh/anomalib/src/anomalib/callbacks/checkpoint.py", line 38, in _should_skip_saving_checkpoint is_zero_or_few_shot = trainer.model.learning_type in [LearningType.ZERO_SHOT, LearningType.FEW_SHOT] File "/root/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'DistributedDataParallel' object has no attribute 'learning_type'

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

2.1.0

Configuration YAML

none

Logs

none

Code of Conduct

samet-akcay commented 8 months ago

@pipiyaa, we are aware of this and would like to address it in v1.1. Please follow https://github.com/openvinotoolkit/anomalib/issues/1449 for further updates.

Closing this one since it is duplicate. Thanks!