[Bug]: RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.

Describe the bug

when use reverse_distillation convert to onnx model, raise this error.

Dataset

Other (please specify in the text field below)

Model

Reverse Distillation

Steps to reproduce the behavior

reverse_distillation to onnx model

OS information

OS information:

OS: windows10
Python 3.8.13
Anomalib version: 0.4.0
PyTorch version: 1.12.0
CUDA/cuDNN version: 11.3
GPU models and configuration: 1x GeForce RTX 3060ti
Any other relevant information: I'm using a custom dataset

Expected behavior

How to solve this problem?

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

log_level: INFO # ['DEBUG', 'INFO', 'WARNING', 'ERROR']

dataset:
  name: mvtec
  format: mvtec
  path: D:/fxzhang/images/无纺布
  category: wufangbu1
  task: segmentation
  train_batch_size: 16
  eval_batch_size: 16
  inference_batch_size: 32
  num_workers: 0
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: reverse_distillation
  lr: 0.005
  backbone:  wide_resnet50_2 # resnet18
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  early_stopping:
    patience: 30
    metric: pixel_AUROC
    mode: max
  beta1: 0.5
  beta2: 0.99
  normalization_method: min_max # options: [null, min_max, cdf]
  anomaly_map_mode: multiply

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: tensorboard # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: onnx #options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 2 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 3
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

Traceback (most recent call last):
  File "D:/fxzhang/code/anomalib-0.4.0/anomalib-0.4.0/tools/train.py", line 75, in <module>
    train()
  File "D:/fxzhang/code/anomalib-0.4.0/anomalib-0.4.0/tools/train.py", line 61, in train
    trainer.fit(model=model, datamodule=datamodule)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 700, in fit
    self._call_and_handle_interrupt(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 654, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 741, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1166, in _run
    results = self._run_stage()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1282, in _run_train
    self.fit_loop.run()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\loops\loop.py", line 207, in run
    output = self.on_run_end()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 325, in on_run_end
    self.trainer._call_callback_hooks("on_train_end")
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1596, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
  File "D:\fxzhang\code\anomalib-0.4.0\anomalib-0.4.0\anomalib\utils\callbacks\export.py", line 47, in on_train_end
    export(
  File "D:\fxzhang\code\anomalib-0.4.0\anomalib-0.4.0\anomalib\deploy\export.py", line 82, in export
    onnx_path = _export_to_onnx(model, input_size, export_path)
  File "D:\fxzhang\code\anomalib-0.4.0\anomalib-0.4.0\anomalib\deploy\export.py", line 99, in _export_to_onnx
    torch.onnx.export(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\__init__.py", line 350, in export
    return utils.export(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 163, in export
    _export(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 1074, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 731, in _model_to_graph
    graph = _optimize_graph(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 308, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\__init__.py", line 416, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 1406, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\symbolic_helper.py", line 232, in wrapper
    return fn(g, *args, **kwargs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\symbolic_opset9.py", line 1684, in _convolution
    raise RuntimeError(
RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.
Epoch 2: 100%|██████████| 10/10 [00:09<00:00,  1.09it/s, loss=0.458, v_num=0, train_loss_step=0.377, train_loss_epoch=0.415, pixel_F1Score=0.288, pixel_AUROC=0.857]

Process finished with exit code 1

Code of Conduct

[X] I agree to follow this project's Code of Conduct

This seems to be not an issue on the latest Anomalib version. Can you please upgrade to the latest version and try again?

Here is the config file I used, which could train a model and perform an onnx inference.

dataset:
  name: mvtec
  format: mvtec
  path: ./datasets/MVTec
  category: bottle
  task: segmentation
  train_batch_size: 32
  eval_batch_size: 32
  inference_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: reverse_distillation
  lr: 0.005
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  early_stopping:
    patience: 3
    metric: pixel_AUROC
    mode: max
  beta1: 0.5
  beta2: 0.99
  normalization_method: min_max # options: [null, min_max, cdf]
  anomaly_map_mode: multiply

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: onnx #options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 2 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 3
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

And here are the versions that I used to reproduce this

>>> import anomalib
>>> anomalib.__version__
'1.0.0dev'
>>> import torch
>>> torch.__version__
'2.0.1+cu117'
>>> import onnx
>>> onnx.__version__
'1.14.0'

I'm closing this issue. If you still experience issues, feel free to re-open it. Thanks!

openvinotoolkit / anomalib