openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.68k stars 654 forks source link

[Bug]: RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape. #1059

Closed fat-921 closed 1 year ago

fat-921 commented 1 year ago

Describe the bug

when use reverse_distillation convert to onnx model, raise this error.

Dataset

Other (please specify in the text field below)

Model

Reverse Distillation

Steps to reproduce the behavior

reverse_distillation to onnx model

OS information

OS information:

Expected behavior

How to solve this problem?

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

log_level: INFO # ['DEBUG', 'INFO', 'WARNING', 'ERROR']

dataset:
  name: mvtec
  format: mvtec
  path: D:/fxzhang/images/无纺布
  category: wufangbu1
  task: segmentation
  train_batch_size: 16
  eval_batch_size: 16
  inference_batch_size: 32
  num_workers: 0
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: reverse_distillation
  lr: 0.005
  backbone:  wide_resnet50_2 # resnet18
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  early_stopping:
    patience: 30
    metric: pixel_AUROC
    mode: max
  beta1: 0.5
  beta2: 0.99
  normalization_method: min_max # options: [null, min_max, cdf]
  anomaly_map_mode: multiply

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: tensorboard # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: onnx #options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 2 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 3
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

Traceback (most recent call last):
  File "D:/fxzhang/code/anomalib-0.4.0/anomalib-0.4.0/tools/train.py", line 75, in <module>
    train()
  File "D:/fxzhang/code/anomalib-0.4.0/anomalib-0.4.0/tools/train.py", line 61, in train
    trainer.fit(model=model, datamodule=datamodule)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 700, in fit
    self._call_and_handle_interrupt(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 654, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 741, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1166, in _run
    results = self._run_stage()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1252, in _run_stage
    return self._run_train()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1282, in _run_train
    self.fit_loop.run()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\loops\loop.py", line 207, in run
    output = self.on_run_end()
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 325, in on_run_end
    self.trainer._call_callback_hooks("on_train_end")
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1596, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
  File "D:\fxzhang\code\anomalib-0.4.0\anomalib-0.4.0\anomalib\utils\callbacks\export.py", line 47, in on_train_end
    export(
  File "D:\fxzhang\code\anomalib-0.4.0\anomalib-0.4.0\anomalib\deploy\export.py", line 82, in export
    onnx_path = _export_to_onnx(model, input_size, export_path)
  File "D:\fxzhang\code\anomalib-0.4.0\anomalib-0.4.0\anomalib\deploy\export.py", line 99, in _export_to_onnx
    torch.onnx.export(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\__init__.py", line 350, in export
    return utils.export(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 163, in export
    _export(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 1074, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 731, in _model_to_graph
    graph = _optimize_graph(
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 308, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\__init__.py", line 416, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\utils.py", line 1406, in _run_symbolic_function
    return symbolic_fn(g, *inputs, **attrs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\symbolic_helper.py", line 232, in wrapper
    return fn(g, *args, **kwargs)
  File "D:\software\Anaconda3\envs\mmdp\lib\site-packages\torch\onnx\symbolic_opset9.py", line 1684, in _convolution
    raise RuntimeError(
RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.
Epoch 2: 100%|██████████| 10/10 [00:09<00:00,  1.09it/s, loss=0.458, v_num=0, train_loss_step=0.377, train_loss_epoch=0.415, pixel_F1Score=0.288, pixel_AUROC=0.857]

Process finished with exit code 1

Code of Conduct

SimonB97 commented 1 year ago

Facing the same issue here, would appreciate a fix!

samet-akcay commented 1 year ago

This seems to be not an issue on the latest Anomalib version. Can you please upgrade to the latest version and try again?

Here is the config file I used, which could train a model and perform an onnx inference.

dataset:
  name: mvtec
  format: mvtec
  path: ./datasets/MVTec
  category: bottle
  task: segmentation
  train_batch_size: 32
  eval_batch_size: 32
  inference_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: null # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: 64
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: reverse_distillation
  lr: 0.005
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  early_stopping:
    patience: 3
    metric: pixel_AUROC
    mode: max
  beta1: 0.5
  beta2: 0.99
  normalization_method: min_max # options: [null, min_max, cdf]
  anomaly_map_mode: multiply

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: onnx #options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 2 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 3
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

And here are the versions that I used to reproduce this

>>> import anomalib
>>> anomalib.__version__
'1.0.0dev'
>>> import torch
>>> torch.__version__
'2.0.1+cu117'
>>> import onnx
>>> onnx.__version__
'1.14.0'

I'm closing this issue. If you still experience issues, feel free to re-open it. Thanks!