Issues with heatmap scaling when training without masks

openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.

https://anomalib.readthedocs.io/en/latest/

Apache License 2.0

3.8k stars 674 forks source link

Issues with heatmap scaling when training without masks #173

Closed LukasBommes closed 2 years ago

LukasBommes commented 2 years ago

I ran into an issue when training on a dataset (subset of MVTec), where I set all ground truth masks to zero (to simulate training on a dataset for which I have no ground truth masks). When training with the actual ground truth masks, the model produces heatmaps as expected as in the first image below (produced with tools/inference.py). However, when training with the zero masks, the heatmaps seem to be scaled differently as in the second image below. The confidence score seems unaffected.

ground_truth_masks

zero_masks

This behaviour is the same for both PADIM and PatchCore. I haven't tested the other models.

This is my model config for PADIM

dataset:
  name: mvtec_test
  format: folder
  path: ./datasets/mvtec_test/images
  normal: normal
  abnormal: abnormal
  task: segmentation
  mask: ./datasets/mvtec_test/masks_orig
  extensions: null
  seed: 0  
  image_size: 224
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 16
  transform_config: null
  split_ratio: 0.2
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: padim
  backbone: resnet18
  layers:
    - layer1
    - layer2
    - layer3
  metric: auc
  normalization_method: min_max # options: [none, min_max, cdf]
  threshold:
    image_default: 3
    pixel_default: 3
    adaptive: true

project:
  seed: 42
  path: ./results
  log_images_to: ["local"]
  logger: false
  save_to_csv: false

optimization:
  openvino:
    apply: false

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: null
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  checkpoint_callback: true
  default_root_dir: null
  deterministic: false
  fast_dev_run: false
  gpus: 1
  gradient_clip_val: 0
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: 1
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  prepare_data_per_node: true
  process_position: 0
  profiler: null
  progress_bar_refresh_rate: null
  replace_sampler_ddp: true
  stochastic_weight_avg: false
  sync_batchnorm: false
  terminate_on_nan: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.
  weights_save_path: null
  weights_summary: top

innat commented 2 years ago

What's your folder structure? In MVTec, it's

my_dataset # root folder 
  bottle # object type 
    train  # path: samples for training 
       good # folder : contains only good sample for train 
    test   # path: sample for testing 
       good # folder: contain only good sample for test 
        defected_type_1 # folder: defected images 
        detected_type_2 # folder defected images 
        .....
    ground_truth # folder: contains binary mask 
        defected_type_1_mask # folder: binary mask 
        defected_type_2_mask # folder: binary mask 
        .....

LukasBommes commented 2 years ago

Thanks for your reply!

To rule out the folder dataset as the error source, I reproduced the same issue using the MVTec dataset and default model configs. Once, I train on the provided ground truth masks and once on zero masks. Again, when training on the zero mask, the heatmap is scaled differently.

Should the output heatmap not be independent of the ground truth mask? From my understanding, it should only be used to compute the evaluation scores, correct?

My zero masks are simply created with np.zeros((900,900.3)).astype(np.uint8) and then stored with cv2.imwrite as PNGs.

innat commented 2 years ago

Should the output heatmap not be independent of the ground truth mask? From my understanding, it should only be used to compute the evaluation scores, correct?

relevant issue. relevant issue

My zero masks are simply created with np.zeros((900,900.3)).astype(np.uint8) and then stored with cv2.imwrite as PNGs.

I'm wondering how zero maks can play a role here.

@samet-akcay

I'm trying to understand the workflow of the anomaly detection task. If I understand, we only need good samples for training And in testing time, there are good and bad samples with ground truth masks - like in the mv-tec dataset. In some cases, the program will adaptively pick a threshold. My custom dataset structure is as follows:

bottle # folder name with object type 
    train  # path: samples for training 
       good # folder: contains only good samples for train 
    test   # path: sample for testing 
       good # folder: contain only good samples for the test 
       defected_type_1 # folder: defected images 
    ground_truth # folder: contains the binary mask 
       defected_type_1_mask # folder: binary mask

I'm not sure how to change the config with this. For now, I change the config as follows, with the test set only; whereas it should be train set.

dataset:
  name: mvtec #options: [mvtec, btech, folder]
  format: folder # mvtec
  path: C:/Users/..../bottle/test # ./datasets/MVTec
  normal: good
  abnormal: defected_type_1 
  split_ratio: 0.2 
  seed: 0
  task: segmentation
  mask: C:/Users/..../bottle/ground_truth/defected_type_1
  extensions: null
  # category: bottle
  image_size: 224
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 1 # 36
  transform_config: null
  create_validation_set: false
  tiling:
    apply: false # true
    tile_size: null # 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

If I don't have the ground truth, how should I change the config for the custom dataset? Also, any tips about the best practices to get optimal results?

alexriedel1 commented 2 years ago

At test time, the model normalizes both the anomaly map and the score https://github.com/openvinotoolkit/anomalib/blob/8c1a04fc7fbd95fae1dac6cb0641a7381ec8e5d4/anomalib/deploy/inferencers/torch.py#L154

The values used for this normalization are obtained during validation, but can also be provided by you in the meta data https://github.com/openvinotoolkit/anomalib/blob/8c1a04fc7fbd95fae1dac6cb0641a7381ec8e5d4/anomalib/deploy/optimize.py#L30-L55

So the best way would be to check your models meta data and adjust if needed..

samet-akcay commented 2 years ago

Hi @LukasBommes and @innat, it's because you guys set the adaptive threshold to True. It calculates the threshold based on the best f1 score based on the prediction and annotations. When you provide zero masks, the model assumes that this is a segmentation task and computes the threshold based on the zero-masks and get confused. That's why you see wrong heatmap as the output.

If you want to use a custom dataset you should use folder dataset. Here is sample config how to use it

samet-akcay commented 2 years ago

If I don't have the ground truth, how should I change the config for the custom dataset? Also, any tips about the best practices to get optimal results?

@innat, if you don't have the mask annotations, you could set the mask path to null. Something like the following:

dataset:
  name: mvtec #options: [mvtec, btech, folder]
  format: folder # mvtec
  path: C:/Users/..../bottle/test # ./datasets/MVTec
  normal: good
  abnormal: defected_type_1 
  split_ratio: 0.2 
  seed: 0
->task: classification
->mask: null
  extensions: null
  # category: bottle
  image_size: 224
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 1 # 36
  transform_config: null
  create_validation_set: false
  tiling:
    apply: false # true
    tile_size: null # 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

Since there is no mask annotations, the task would become classification. It's currently set manually, but we'll be automatically adjusting this

innat commented 2 years ago

@samet-akcay Thank you. My current dataset is kind of a mess (some have ground truth and some don't). I need to sort them out properly.

Could you please inform me how should I update the config with the following folder structure? It's like mv-tec, good samples for training. And, good + defected samples for testing/evaluation. It has the ground truth binary mask.

dummy_folder_name # folder name with object type 
    train  # path: samples for training 
       good # folder: contains only good samples for train 
    test   # path: sample for testing 
       good # folder: contain only good samples for the test 
       defected_type_1 # folder: defected images 
       defected_type_2 # folder: defected images 
        .....
    ground_truth # folder: contains the binary mask 
       defected_type_1_mask # folder: binary mask 
       defected_type_2_mask # folder: binary mask 
        .....

I mostly confuse the following params:

name: any_name_ #options: [mvtec, btech, folder]
  format: folder # mvtec
  path: ?
  normal: ?
  abnormal: ?
  task: segmentation 
  mask: ?

Lastly, the num_workers param, by default is set to 36. I think it should be adaptive, like computing the cores of the running system.

samet-akcay commented 2 years ago

I would place defected_type_1 and defected_type_2 directories into defect folder. Otherwise, the folder dataset may not find both of the directories. Something like

dummy_folder_name # folder name with object type 
    train  # path: samples for training 
       good # folder: contains only good samples for train 
    test   # path: sample for testing 
       good # folder: contain only good samples for the test
       defect
           defected_type_1 # folder: defected images 
           defected_type_2 # folder: defected images 
        .....
    ground_truth # folder: contains the binary mask
       defect 
           defected_type_1_mask # folder: binary mask 
           defected_type_2_mask # folder: binary mask 
        .....

name: dummy_folder_name
  format: folder
  path: path/to/dummy_folder_name
  normal: good
  abnormal: defect
  task: segmentation 
  mask: path/to/dummy_folder_name/ground_truth/defect

I'm not 100% sure though, I need to double check this.

Another alternative to folder dataset parameters would be to remove path and only have normal_path, abnormal_path and mask_path. This needs some discussion though

innat commented 2 years ago

Okay, so in the config setup above

name: dummy_folder_name
  format: folder
  path: path/to/dummy_folder_name
  normal: good # i think it should be 'train/good'
  abnormal: defect # and it should be 'test/defect`
  task: segmentation 
  mask: path/to/dummy_folder_name/ground_truth/defect

If so, where I should set the path for path/to/dummy_folder_name/test/good? Is good sample for the test needed?

samet-akcay commented 2 years ago

argh, just noticed it now. folder format assumes that the dataset doesn't have train/test splits. Your dataset, however, already split into train/test. In this case, folder may not be the best option. You might want to use mvtec format.

samet-akcay commented 2 years ago

Examples in the docstring might be helpful to understand how FolderDataset works https://github.com/openvinotoolkit/anomalib/blob/development/anomalib/data/folder.py#L316

LukasBommes commented 2 years ago

Thanks for all your inputs on this and good to know that the issue with the heatmap is expected behaviour. I am also looking forward to your new method for setting the threshold in an unsupervised manner.

@samet-akcay: I tried a custom folder dataset and set task: classification and mask: null. But this also gives me the KeyError: label reported here.

My dataset structure is as follows:

anomalib/datasets/mvtec_test
|-- normal
|     |-- 000.png
|     |-- 001.png
|     |-- ...
|-- abnormal
|     |-- 000.png
|     |-- 001.png
|     |-- ...

and this is the complete config

dataset:
  name: mvtec_test
  format: folder
  path: ./datasets/mvtec_test
  normal: normal
  abnormal: abnormal
  task: classification
  mask: null
  extensions: null
  split_ratio: 0.2  
  seed: 0  
  image_size: 224
  train_batch_size: 32
  test_batch_size: 1
  num_workers: 16
  transform_config: null  
  create_validation_set: true  
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: padim
  backbone: resnet18
  layers:
    - layer1
    - layer2
    - layer3
  metric: auc
  normalization_method: min_max # options: [none, min_max, cdf]
  threshold:
    image_default: 3
    pixel_default: 3
    adaptive: true

project:
  seed: 42
  path: ./results
  log_images_to: ["local"]
  logger: false
  save_to_csv: false

optimization:
  openvino:
    apply: false

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: null
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  checkpoint_callback: true
  default_root_dir: null
  deterministic: false
  fast_dev_run: false
  gpus: 1
  gradient_clip_val: 0
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 1
  max_steps: -1
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: 1
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  prepare_data_per_node: true
  process_position: 0
  profiler: null
  progress_bar_refresh_rate: null
  replace_sampler_ddp: true
  stochastic_weight_avg: false
  sync_batchnorm: false
  terminate_on_nan: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.
  weights_save_path: null
  weights_summary: top

LukasBommes commented 2 years ago

Thanks for fixing this. I'll give it a try in the afternoon.