openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.68k stars 654 forks source link

[Bug]: Custom Callbacks are not being logged in `default_root_dir` since anomalib V1 #2117

Open CarlosNacher opened 3 months ago

CarlosNacher commented 3 months ago

Describe the bug

With the update of anomalib to V1, the way to create callbacks changed. Now you can pass some callbacks in the config.yaml file, with the desired init_args. Example below:

trainer:
  ...
  callbacks:
  - class_path: anomalib.callbacks.checkpoint.ModelCheckpoint
    init_args:
      dirpath: weights/lightning
      filename: best_model-{epoch}-{image_F1Score:.2f}
      monitor: image_F1Score
      save_last: true
      mode: max
      auto_insert_metric_name: true

So your models will be logged into "weights/lightning". However, if you see engine.py line 421:

# Add ModelCheckpoint if it is not in the callbacks list.
has_checkpoint_callback = any(isinstance(c, ModelCheckpoint) for c in self._cache.args["callbacks"])
if has_checkpoint_callback is False:
    _callbacks.append(
        ModelCheckpoint(
            dirpath=self._cache.args["default_root_dir"] / "weights" / "lightning",
            filename="model",
            auto_insert_metric_name=False,
        ),
    )

the default ModelCheckpoint callback uses self._cache.args["default_root_dir"] which in engine.py _setup_workspace() method (line 315) was updated. So the default callbacks will log into this self._cache.args["default_root_dir"] but the custom callbacks will not. And I think the correct would be that all stuff related with the same run to be logged within the same dafault_root_dir.

Dataset

N/A

Model

N/A

Steps to reproduce the behavior

  1. Train a model with anomalib last code, providing a custom callback in your config.yaml, setting the dirpath init_arg to whatever.
  2. You will see some things of your run are logged in one folder locally while the callback is logged in another one (the specified en dirpath)

OS information

OS information:

Expected behavior

That all custom callbacks to be logged in the default_root_dir path (plus some aditional path provided by the user).

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

main

Configuration YAML

data:
  class_path: anomalib.data.Folder
  init_args:
    name: my_dataset
    normal_dir: train/good
    root: data/processed/my_dataset
    abnormal_dir:
    - val/defect1
    - val/defect2
    normal_test_dir: val/good
    mask_dir: null
    normal_split_ratio: 0.2
    extensions:
    - .png
    train_batch_size: 1
    eval_batch_size: 1
    num_workers: 8
    task: classification
    image_size:
    - 1024
    - 608
    transform: null
    train_transform: null
    eval_transform: null
    test_split_mode: from_dir
    test_split_ratio: 0.2
    val_split_mode: same_as_test
    val_split_ratio: 0.5
    seed: 6120
model:
  class_path: anomalib.models.EfficientAd
  init_args:
    imagenet_dir: data/external/imagenette
    teacher_out_channels: 384
    model_size: S
    lr: 0.0001
    weight_decay: 1.0e-05
    padding: false
    pad_maps: true
normalization:
  normalization_method: min_max
metrics:
  image:
  - F1Score
  - AUROC
  pixel:
  - F1Score
  - AUROC
  threshold:
    class_path: anomalib.metrics.F1AdaptiveThreshold
    init_args:
      default_value: 0.5
logging:
  log_graph: false
seed_everything: 6120
task: classification
default_root_dir: results
ckpt_path: null
trainer:
  accelerator: auto
  strategy: auto
  devices: 1
  num_nodes: 1
  precision: 32
  logger:
  - class_path: anomalib.loggers.AnomalibWandbLogger
    init_args:
      project: my_project_name
  - class_path: anomalib.loggers.AnomalibMLFlowLogger
    init_args:
      experiment_name: my_project_name
  callbacks:
  - class_path: anomalib.callbacks.checkpoint.ModelCheckpoint
    init_args:
      dirpath: weights/lightning
      filename: best_model-{epoch}-{image_F1Score:.2f}
      monitor: image_F1Score
      save_last: true
      mode: max
      auto_insert_metric_name: true
  fast_dev_run: false
  max_epochs: 200
  min_epochs: null
  max_steps: 70000
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  overfit_batches: 0.0
  val_check_interval: 1.0
  check_val_every_n_epoch: 1
  num_sanity_val_steps: 0
  log_every_n_steps: 50
  enable_checkpointing: true
  enable_progress_bar: true
  enable_model_summary: true
  accumulate_grad_batches: 1
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  deterministic: false
  benchmark: false
  inference_mode: true
  use_distributed_sampler: true
  profiler: null
  detect_anomaly: false
  barebones: false
  plugins: null
  sync_batchnorm: false
  reload_dataloaders_every_n_epochs: 0
  default_root_dir: null

Logs

None

Code of Conduct