openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.41k stars 616 forks source link

[Bug]: FileNotFoundError in "001_getting_started" tutorial #2053

Closed baek85 closed 1 month ago

baek85 commented 1 month ago

Describe the bug

Error occurs in Padim model fitting process

start training

engine = Engine(task=TaskType.SEGMENTATION) engine.fit(model=model, datamodule=datamodule)

Dataset

MVTec

Model

PADiM

Steps to reproduce the behavior

Follow tutorial code in "001_getting_started.ipynb"

OS information

OS information:

Expected behavior

Estimate multi-variate gaussian distribution of MVTec AD dataset using pre-trained model

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

main

Configuration YAML

No config in tutorial

Logs

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[8], line 3
      1 # start training
      2 engine = Engine(task=TaskType.SEGMENTATION)
----> 3 engine.fit(model=model, datamodule=datamodule)

File ~/.local/lib/python3.10/site-packages/anomalib/engine/engine.py:533, in Engine.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    524     ckpt_path = Path(ckpt_path).resolve()
    526 self._setup_workspace(
    527     model=model,
    528     train_dataloaders=train_dataloaders,
   (...)
    531     versioned_dir=True,
    532 )
--> 533 self._setup_trainer(model)
    534 self._setup_dataset_task(train_dataloaders, val_dataloaders, datamodule)
    535 self._setup_transform(model, datamodule=datamodule, ckpt_path=ckpt_path)

File ~/.local/lib/python3.10/site-packages/anomalib/engine/engine.py:324, in Engine._setup_trainer(self, model)
    321     self._cache.update(model)
    323 # Setup anomalib callbacks to be used with the trainer
--> 324 self._setup_anomalib_callbacks()
    326 # Temporarily set devices to 1 to avoid issues with multiple processes
    327 self._cache.args["devices"] = 1

File ~/.local/lib/python3.10/site-packages/anomalib/engine/engine.py:415, in Engine._setup_anomalib_callbacks(self)
    412 has_checkpoint_callback = any(isinstance(c, ModelCheckpoint) for c in self._cache.args["callbacks"])
    413 if has_checkpoint_callback is False:
    414     _callbacks.append(
--> 415         ModelCheckpoint(
    416             dirpath=self._cache.args["default_root_dir"] / "weights" / "lightning",
    417             filename="model",
    418             auto_insert_metric_name=False,
    419         ),
    420     )
    422 # Add the post-processor callbacks.
    423 _callbacks.append(_PostProcessorCallback())

File ~/.local/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:253, in ModelCheckpoint.__init__(self, dirpath, filename, monitor, verbose, save_last, save_top_k, save_weights_only, mode, auto_insert_metric_name, every_n_train_steps, train_time_interval, every_n_epochs, save_on_train_epoch_end, enable_version_counter)
    251 self.dirpath: Optional[_PATH]
    252 self.__init_monitor_mode(mode)
--> 253 self.__init_ckpt_dir(dirpath, filename)
    254 self.__init_triggers(every_n_train_steps, every_n_epochs, train_time_interval)
    255 self.__validate_init_configuration()

File ~/.local/lib/python3.10/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:475, in ModelCheckpoint.__init_ckpt_dir(self, dirpath, filename)
    472 self._fs = get_filesystem(dirpath if dirpath else "")
    474 if dirpath and _is_local_file_protocol(dirpath if dirpath else ""):
--> 475     dirpath = os.path.realpath(os.path.expanduser(dirpath))
    477 self.dirpath = dirpath
    478 self.filename = filename

File /opt/conda/lib/python3.10/posixpath.py:396, in realpath(filename, strict)
    393     """Return the canonical path of the specified filename, eliminating any
    394 symbolic links encountered in the path."""
    395     filename = os.fspath(filename)
--> 396     path, ok = _joinrealpath(filename[:0], filename, strict, {})
    397     return abspath(path)

File /opt/conda/lib/python3.10/posixpath.py:456, in _joinrealpath(path, rest, strict, seen)
    454         return join(newpath, rest), False
    455 seen[newpath] = None # not resolved symlink
--> 456 path, ok = _joinrealpath(path, os.readlink(newpath), strict, seen)
    457 if not ok:
    458     return join(path, rest), False

FileNotFoundError: [Errno 2] No such file or directory: '/home/sunghyun.baek/home/workspace/AD/anomalib-main/results/Padim/MVTec/bottle/latest'

Code of Conduct

alexriedel1 commented 1 month ago

You must run all the cells of the notebook in the corresponding order

baek85 commented 1 month ago

You must run all the cells of the notebook in the corresponding order

I run all the cells of the notebook in the corresponding order

cjy513203427 commented 1 month ago

Have you run these?

  datamodule = MVTec(num_workers=0)
  datamodule.prepare_data()  # Downloads the dataset if it's not in the specified `root` directory
  datamodule.setup()  # Create train/val/test/prediction sets.

  i, data = next(enumerate(datamodule.val_dataloader()))
  print(data.keys())

This code will download MVTec dataset.

baek85 commented 1 month ago

@cjy513203427 The data loading code you mentioned runs fine, but when I run ‘engine.fit(model=model, datamodule=datamodule)’ afterwards, I get the same error

The error message seems to be about a folder called 'anomalib-main/results/Padim/MVTec/bottle/latest', does anyone know why I am getting this error?

alexriedel1 commented 1 month ago

hmm yes it's about receiving the model checkpoint directories.. Did you try to start from a freshly installed anaconda environment?

I'm not even sure how os.path.realpath() can fail because it is not supposed to. maybe you have some unexpected symlink?

cjy513203427 commented 1 month ago

@cjy513203427 The data loading code you mentioned runs fine, but when I run ‘engine.fit(model=model, datamodule=datamodule)’ afterwards, I get the same error

The error message seems to be about a folder called 'anomalib-main/results/Padim/MVTec/bottle/latest', does anyone know why I am getting this error?

The easiest way is to delete the project and conda env, clone again and install dependency. Maybe you have too long '/home/sunghyun.baek/home/workspace/AD/anomalib-main/results/Padim/MVTec/bottle/latest'. Or it could be permmission problem. I had once os.mkdirs permission failed on Ubuntu

baek85 commented 1 month ago

It was a symlink problem in my setting.