[Bug]: IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)

xingfenghaizeiwang commented 1 year ago

Describe the bug

IndexError: Dimension out of range (expected to be in range of [-1, 0], but got -2)

Dataset

Folder

Model

PatchCore

Steps to reproduce the behavior

python tools/train.py --config D:\project\anomalib-main\anomalib-main\src\anomalib\models\patchcore\config.yaml --model patchcore

OS information

OS information:

OS: [e.g. Ubuntu 20.04]
Python version: [e.g. 3.9.0]
Anomalib version: [e.g. 0.3.6]
PyTorch version: [e.g. 1.9.0]
CUDA/cuDNN version: [e.g. 11.7]
GPU models and configuration: [e.g. 2x GeForce RTX 3060]
Any other relevant information: [e.g. I'm using a custom dataset]

Expected behavior

Unsupervised training

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

No response

Configuration YAML

dataset:
  name: mvtec
  format: folder
  root: D:\project\anomalib-main\anomalib-main\datasets\MVTec
  normal_dir: card/train/good
  normal_test_dir: card/test/good
  abnormal_dir: card/test/ng
  task: segmentation
  mask_dir: null
  extensions: null
  train_batch_size: 32
  eval_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: 224 # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: none # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: onnx # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

File "D:\project\env\anomalib\lib\site-packages\torch\onnx\utils.py", line 989, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "D:\project\env\anomalib\lib\site-packages\torch\onnx\utils.py", line 893, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "D:\project\env\anomalib\lib\site-packages\torch\jit\_trace.py", line 1268, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "D:\project\env\anomalib\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\project\env\anomalib\lib\site-packages\torch\jit\_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "D:\project\env\anomalib\lib\site-packages\torch\jit\_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
    return forward_call(*args, **kwargs)
  File "D:\project\env\anomalib\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "D:\project\env\anomalib\lib\site-packages\anomalib\models\patchcore\torch_model.py", line 81, in forward
    patch_scores, locations = self.nearest_neighbors(embedding=embedding, n_neighbors=1)
  File "D:\project\env\anomalib\lib\site-packages\anomalib\models\patchcore\torch_model.py", line 178, in nearest_neighbors
    distances = self.euclidean_dist(embedding, self.memory_bank)
  File "D:\project\env\anomalib\lib\site-packages\anomalib\models\patchcore\torch_model.py", line 163, in euclidean_dist
    res = x_norm - 2 * torch.matmul(x, y.transpose(-2, -1)) + y_norm.transpose(-2, -1)

Code of Conduct

[X] I agree to follow this project's Code of Conduct

blaz-r commented 1 year ago

Hello. I'll try to reproduce this, but I did try to train patchcore and export to onnx just recently and it worked so I'm not sure what exactly is going on.

blaz-r commented 1 year ago

I just realized something and I think that's the issue. YOu have this in your config:

test_split_mode: none # options: [from_dir, synthetic]
test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)

You can't have validation set to "same as test", when your test is set to "none". This also doesn't work if validation is set to none, which might be an issue since core-set subsampling is done at the beginning of validation and this somehow messes with onnx export. So I'd say that if you don't have test or val data for your dataset, you can use option "synthetic" but right now it seems that without validation set you can't export to onnx.

samet-akcay commented 1 year ago

Duplicate of #977. As @blaz-r mentioned, you will need to set the test set if you want a validation set. We are working on a fix.

openvinotoolkit / anomalib