openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.73k stars 666 forks source link

[Bug]: Unexpected key(s) in state_dict: "pixel_metrics.AUPRO.fpr_limit" #988

Closed preminstrel closed 12 months ago

preminstrel commented 1 year ago

Describe the bug

Error when loaded the checkpoint.

Unexpected key(s) in state_dict: "pixel_metrics.AUPRO.fpr_limit"

It seems that the error is related to the AUPRO metrics.

Dataset

Folder

Model

PADiM

Steps to reproduce the behavior

After I trained the model and saved the checkpoint. I want to use test.py to test the model.

I run command

python tools/test.py --model padim \
    --config RESC_padim.yaml \
    --weight_file results/padim/RESC/run/weights/model-v4.ckpt

OS information

OS information:

Expected behavior

Loaded checkpoint without error

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

83a1b99

Configuration YAML

dataset:
  name: RESC
  format: folder
  path: /home/jinan/Doris/dataset/RESC_Pnet-Test/
  task: segmentation # classification or segmentation

  train_batch_size: 32
  eval_batch_size: 32
  inference_batch_size: 1
  num_workers: 8
  image_size: 100 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]

  normal_dir: /home/jinan/Doris/dataset/RESC_Pnet-Test/train/good # name of the folder containing normal images.
  abnormal_dir:  /home/jinan/Doris/dataset/RESC_Pnet-Test/test/Ungood # name of the folder containing abnormal images.
  mask: /home/jinan/Doris/dataset/RESC_Pnet-Test/test_label/Ungood #optional
  normal_test_dir: /home/jinan/Doris/dataset/RESC_Pnet-Test/test/good/ # optional
  extensions: null #  Type of the image extensions to read from the directory. Defaults to None.

  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: from_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.3 # ratio for validation
  seed: 0

  transform_config:
      train: null
      val: null
  tiling:
      apply: false
      tile_size: null
      stride: null
      remove_border_count: 0
      use_random_tiling: False
      random_tile_count: 16

model:
  name: padim
  backbone: resnet18
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  normalization_method: min_max # options: [none, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
    - AUPRO
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null #options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

To use wandb logger install it using `pip install wandb`
/home/jinan/2023-Doris/hanshi/Retinal-OCT-AD/anomalib/src/anomalib/config/config.py:275: UserWarning: config.project.unique_dir is set to False. This does not ensure that your results will be written in an empty directory and you may overwrite files.
  warn(
Global seed set to 42
/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
FeatureExtractor is deprecated. Use TimmFeatureExtractor instead. Both FeatureExtractor and TimmFeatureExtractor will be removed in a future release.
/home/jinan/2023-Doris/hanshi/Retinal-OCT-AD/anomalib/src/anomalib/utils/callbacks/__init__.py:142: UserWarning: Export option: None not found. Defaulting to no model export
  warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export")
/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:55: LightningDeprecationWarning: Setting `Trainer(resume_from_checkpoint=)` is deprecated in v1.5 and will be removed in v2.0. Please pass `Trainer.fit(ckpt_path=)` directly instead.
  rank_zero_deprecation(
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
  warning_cache.warn(
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
Traceback (most recent call last):
  File "anomalib/tools/test.py", line 56, in <module>
    test()
  File "anomalib/tools/test.py", line 52, in test
    trainer.test(model=model, datamodule=datamodule)
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 794, in test
    return call._call_and_handle_interrupt(
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 842, in _test_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1051, in _run
    self._call_setup_hook()  # allow user to setup lightning_module in accelerator environment
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1299, in _call_setup_hook
    self._call_callback_hooks("setup", stage=fn)
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1394, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
  File "/home/jinan/2023-Doris/hanshi/Retinal-OCT-AD/anomalib/src/anomalib/utils/callbacks/model_loader.py", line 32, in setup
    pl_module.load_state_dict(torch.load(self.weights_path, map_location=pl_module.device)["state_dict"])
  File "/home/jinan/2023-Doris/hanshi/Retinal-OCT-AD/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 244, in load_state_dict
    return super().load_state_dict(state_dict, strict=strict)
  File "/home/jinan/anaconda3/envs/anomalib/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PadimLightning:
        Unexpected key(s) in state_dict: "pixel_metrics.AUPRO.fpr_limit".

Code of Conduct

WenjingKangIntel commented 1 year ago

anomalib_Team3 working on this