openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.8k stars 674 forks source link

Patchcore multiple test batch size is not supported. #268

Closed haobo827 closed 2 years ago

haobo827 commented 2 years ago

I have another problem after dealing with #243 That is: ValueError: Either preds and target both should have the (same) shape (N, ...), or target should be (N, ...) and preds should be (N, C, ...). Epoch 0: 100%|██████████| 34/34 [09:17<00:00, 16.39s/it, loss=nan]

From: File "/home/devadmin/haobo/anomalib_venv/lib/python3.8/site-packages/torchmetrics/utilities/checks.py", line 269, in _check_classification_inputs case, implied_classes = _check_shape_and_type_consistency(preds, target) File "/home/devadmin/haobo/anomalib_venv/lib/python3.8/site-packages/torchmetrics/utilities/checks.py", line 115, in _check_shape_and_type_consistency

Then I print preds and target: Epoch 0: 68%|████████████████Aggregating the embedding extracted from the training set. 2.13it/s, loss=nan] Creating CoreSet Sampler via k-Center Greedy Getting the coreset from the main embedding. Assigning the coreset as the memory bank. Epoch 0: 100%|█████████████████████████████████████████████████████| 34/34 [08:59<00:00, 15.85s/it, loss=nan] preds is: tensor([1.4457])00%|███████████████████████████████████████████████| 11/11 [08:48<00:00, 48.02s/it] target is: tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], dtype=torch.int32)

my patchcore config.yaml is:

dataset:
  name: wafer_line #options: [mvtec, btech, folder]
  format: folder
  path: ../data/wafer_line/
  normal_dir: "train/Negative"
  abnormal_dir: "test/Positive"
  normal_test_dir: "test/Negative"
  task: segmentation
  mask: ../data/wafer_line/ground_truth/Positive
  extensions: ".jpg"
  split_ratio: 0.1
  seed: 0
  image_size: 256
  train_batch_size: 16
  test_batch_size: 16
  num_workers: 20
  transform_config:
    train: null
    val: null
  create_validation_set: false
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
trainer:
  accelerator: "gpu" # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: true
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 0.05
  limit_val_batches: 1.0
  log_every_n_steps: 50
  log_gpu_memory: null
  max_epochs: 10000
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: 1
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  strategy: null
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

Thank you for your patience in reading and answering!

haobo827 commented 2 years ago

should I set test_batch_size=1?

samet-akcay commented 2 years ago

Can you share the tree structure of your dataset please?

haobo827 commented 2 years ago

Can you share the tree structure of your dataset please?

data -wafer_line --ground_truth ---Positive --test ---Positive ---Negative --train ---Negative

And I find too much time for "corest sampling" beacuse: it's calculating minimum distance using cpu, it's about one and half hour on my cpu, and lower coreset_sampling_ratio will reduce computation time, but it also reduce performance? right?

I want to know if there is a good way to solve this cpu calculation problem.(i use Tesla V100S-PCIE-32GB )

samet-akcay commented 2 years ago

Can you set accelerator: "gpu" # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto"> to auto? It shouldn't train on CPU. If it does, there is something wrong

samet-akcay commented 2 years ago

should I set test_batch_size=1?

Yes, if you set test_batch_size: 1, it would work. We'll investigate why it doesn't work for multiple batch sizes.

haobo827 commented 2 years ago

should I set test_batch_size=1?

Yes, if you set test_batch_size: 1, it would work. We'll investigate why it doesn't work for multiple batch sizes.

You are right. Much appreciate!

fujikosu commented 2 years ago

Hi @samet-akcay , thanks for developing this amazing library!

We'll investigate why it doesn't work for multiple batch sizes.

Is any investigation done since then? It'd be great if you could share any info around here if any. I'm running inference over about 18000 images. Although there is still plenty of GPU memory available during inference, due to batch size 1 restriction, inference takes more than 10 hours to finish. So, it'd be great if we could set batch size more than 1.

samet-akcay commented 2 years ago

@fujikosu, we have just merged #580, which should address the multiple batch size problem. Let us know if you still encounter any issues. Thanks!