openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.63k stars 646 forks source link

[Bug]: Segmentation masks do not correspond to classification results #1380

Open phcarval opened 11 months ago

phcarval commented 11 months ago

Describe the bug

Hello,

Having tested a few methods using the changes made in PR https://github.com/openvinotoolkit/anomalib/pull/1378, I have noticed that the segmentation results do not correspond to the classification results.

In the following photographs, you will see that "normal" images may contain defective areas: this is particularly prevalent with DRAEM, but it happens with other methods, such as PaDiM and EfficientAD.

EfficientAD: image

PaDiM: image

DRAEM: image

Dataset

MVTec

Model

N/A

Steps to reproduce the behavior

git clone anomalib cd anomalib build container via VSCode pip install -e . checkout phcarval:more_segmentation_info python3 tools/train.py --config src/anomalib/models/{model}/config.yaml

OS information

OS information:

Expected behavior

Predicted masks should only appear when the classification result is "anomalous".

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

phcarval:more_segmentation_info

Configuration YAML

dataset:
  name: mvtec
  format: mvtec
  path: /gpfswork/rech/hfq/upm97lh/data/MVTec
  category: bottle
  task: segmentation
  train_batch_size: 8
  eval_batch_size: 32
  num_workers: 8
  image_size:
  - 256
  - 256
  center_crop: null
  normalization: imagenet
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir
  test_split_ratio: 0.2
  val_split_mode: same_as_test
  val_split_ratio: 0.5
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: false
    random_tile_count: 16
model:
  name: draem
  anomaly_source_path: null
  lr: 0.0001
  enable_sspcab: false
  sspcab_lambda: 0.1
  early_stopping:
    patience: 20
    metric: image_AUROC
    mode: max
  normalization_method: min_max
  input_size:
  - 256
  - 256
metrics:
  image:
  - F1Score
  - AUROC
  - Accuracy
  - Recall
  - Specificity
  pixel:
  - F1Score
  - AUROC
  threshold:
    method: adaptive
    manual_image: null
    manual_pixel: null
visualization:
  show_images: false
  save_images: true
  log_images: true
  image_save_path: null
  mode: full
project:
  seed: -1
  path: /gpfswork/rech/hfq/upm97lh/anomalib/results/draem/mvtec/bottle/run.2023-09-28_16-27-49
  unique_dir: true
logging:
  logger:
  - csv
  - tensorboard
  log_graph: false
optimization:
  export_mode: null
trainer:
  enable_checkpointing: true
  default_root_dir: /gpfswork/rech/hfq/upm97lh/anomalib/results/draem/mvtec/bottle/run.2023-09-28_16-27-49
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 700
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0
  log_every_n_steps: 50
  accelerator: auto
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

I have lost them but can try to provide if needed

Code of Conduct

blaz-r commented 11 months ago

Hello.

I think this might come from the way image and pixel threshold is calculated. Since these are independent thresholds, calculated on a different (yet not entirely independent) data, I believe that it can happen that you get some anomalous pixels in the segmentation map but the entire image is still classified as normal.

This most likely happens due to the fact, that the anomaly score in most models is produced by taking the maximum from an anomaly map, not as a separate process that would actually calculate anomaly score (like PatchCore for example, or some other models that have sort of a sub-network to get score out of anomaly map and other features).

So I think that this is expected.

phcarval commented 10 months ago

I understand that if the anomaly score is calculated differently than the anomaly map (Patchcore), then this behavior is expected. However, for methods that take the anomaly map's maximum value, wouldn't it make more sense to tie the choice of the anomaly threshold to the score threshold?

blaz-r commented 10 months ago

I'm not sure really, but right now these are separate values, calculated separately to maximize each task. I assume it would depend on the model used, but I can't really say anything for sure.