openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.82k stars 679 forks source link

pixel_F1Score 0.1 #1192

Closed monkeycc closed 1 year ago

monkeycc commented 1 year ago

PatchCore

Image high resolution 1280*1024 Minor defects

The training results were very poor I'm not sure how to adjust the configuration or switch to any other model

/media/AI/anomalib/src/anomalib/utils/callbacks/__init__.py:142: UserWarning: Export option: None not found. Defaulting to no model export
  warnings.warn(f"Export option: {config.optimization.export_mode} not found. Defaulting to no model export")
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - GPU available: True (cuda), used: True
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - TPU available: False, using: 0 TPU cores
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - IPU available: False, using: 0 IPUs
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - HPU available: False, using: 0 HPUs
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
2023-07-19 22:49:26,487 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
2023-07-19 22:49:26,487 - anomalib - INFO - Training the model.
2023-07-19 22:49:26,770 - pytorch_lightning.utilities.rank_zero - INFO - You are using a CUDA device ('NVIDIA GeForce RTX 3060') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
2023-07-19 22:49:26,817 - anomalib.data.base.datamodule - INFO - No normal test images found. Sampling from training set using a split ratio of 0.20
/home/AI/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `ROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
2023-07-19 22:49:27,641 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/AI/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py:183: UserWarning: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
  rank_zero_warn(
2023-07-19 22:49:27,645 - pytorch_lightning.callbacks.model_summary - INFO - 
  | Name                  | Type                     | Params
-------------------------------------------------------------------
0 | image_threshold       | AnomalyScoreThreshold    | 0     
1 | pixel_threshold       | AnomalyScoreThreshold    | 0     
2 | model                 | PatchcoreModel           | 24.9 M
3 | image_metrics         | AnomalibMetricCollection | 0     
4 | pixel_metrics         | AnomalibMetricCollection | 0     
5 | normalization_metrics | MinMax                   | 0     
-------------------------------------------------------------------
24.9 M    Trainable params
0         Non-trainable params
24.9 M    Total params
99.450    Total estimated model params size (MB)
Epoch 0:   0%|                                                                                                                                                                                                                                                                                                                                             | 0/23 [00:00<?, ?it/s]/home/AI/anaconda3/envs/anomalib_env/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:138: UserWarning: `training_step` returned `None`. If this was on purpose, ignore this warning...
  self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")
Epoch 0:  65%|████████████████2023-07-19 22:49:37,774 - anomalib.models.patchcore.lightning_model - INFO - Aggregating the embedding extracted from the training set.█████████████████████████████████████████████████████▊                                                                                                             | 15/23 [00:10<00:05,  1.48it/s, loss=nan]
2023-07-19 22:49:37,776 - anomalib.models.patchcore.lightning_model - INFO - Applying core-set subsampling to get the embedding.
Epoch 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [04:40<00:00, 12.19s/it, loss=nan, pixel_F1Score=0.179, pixel_AUROC=0.985]2023-07-19 22:54:08,388 - pytorch_lightning.utilities.rank_zero - INFO - `Trainer.fit` stopped: `max_epochs=1` reached.                                                                                                                                                                                                                                                           
Epoch 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23/23 [04:40<00:00, 12.21s/it, loss=nan, pixel_F1Score=0.179, pixel_AUROC=0.985]
2023-07-19 22:54:08,575 - anomalib.utils.callbacks.timer - INFO - Training took 280.93 seconds
2023-07-19 22:54:08,575 - anomalib - INFO - Loading the best model weights.
2023-07-19 22:54:08,575 - anomalib - INFO - Testing the model.
2023-07-19 22:54:08,579 - pytorch_lightning.utilities.rank_zero - INFO - You are using a CUDA device ('NVIDIA GeForce RTX 3060') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
2023-07-19 22:54:08,579 - anomalib.utils.callbacks.model_loader - INFO - Loading the model from /OUT/run/weights/lightning/model.ckpt
2023-07-19 22:54:08,776 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:44<00:00,  5.61s/it]2023-07-19 22:54:54,735 - anomalib.utils.callbacks.timer - INFO - Testing took 45.95708084106445 seconds
Throughput (batch_size=32) : 5.1787449429846655 FPS
Testing DataLoader 0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:45<00:00,  5.65s/it]
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       image_AUROC           0.974371612071991
      image_F1Score          0.95419842004776
       pixel_AUROC          0.9845668077468872
      pixel_F1Score         0.17864571511745453

Number of images normal_dir: 560 abnormal_dir: 120


dataset:
  name: patchcore
  format: folder
  path: /media/patchcore/
  normal_dir: OK # name of the folder containing normal images.
  abnormal_dir: NG # name of the folder containing abnormal images.
  normal_test_dir: null # name of the folder containing normal test images.
  task: segmentation # classification or segmentation
  mask: MASK  #optional
  extensions: null
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: 224 # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./OUT/

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
samet-akcay commented 1 year ago

@monkeycc, can you add a bit more context please?

monkeycc commented 1 year ago

,你能添加更多的上下文吗?

Sorry , Re edit completed

alexriedel1 commented 1 year ago

To understand why your pixel f1 is so low,we need to have a look at your dataset, your annotations and the model results. Why exactly are you interested in a high pixel F1 score?

If youre images are very large and the defects are very small you can try to increase the image size to 384 or even 512

image_size: 384 # dimensions to which images are resized (mandatory)
center_crop: 384 # dimensions to which images are center-cropped after resizing (optional)
samet-akcay commented 1 year ago

and note that f1-score is a bit of a harsh metric, penalizing the false positives. Please check your ground-truth and predictions. If predictions are slightly different than your ground-truth this will hurt your f1-score even though you think that the predictions look alright.

samet-akcay commented 1 year ago

Since this is not an issue with Patchcore and more about a usecase issue, I'll convert this to a discussion. Thanks!