openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.76k stars 668 forks source link

[Bug]: Patchcore Openvino Model fails to load "RuntimeError: std::bad_alloc" #989

Closed bansi-maddali closed 1 year ago

bansi-maddali commented 1 year ago

Describe the bug

I have tried using the default mvtec dataset to train the model & generate openvino IR models. The models are not loading due to insufficient memory. image

I tried to convert the ONXX to IR using openvino library , the IR models are generated still they are failing to load. image

However no issues for the PL model.

Dataset

MVTec

Model

PatchCore

Steps to reproduce the behavior

Steps :

  1. Cloned the 'master' branch & installed the package
git clone https://github.com/openvinotoolkit/anomalib.git
cd anomalib
pip install -e .
  1. I have updated the config.yml
    optimization:
    export_mode: openvino # options: onnx, openvino
  2. I have trained the patchcore model

python tools/train.py --config src/anomalib/models/patchcore/config.yaml

  1. Training completed successfully & also generated .bin & .xml as well as .onxx model.

  2. Tried to run inference using the inference script provided

!python tools/inference/openvino_inference.py \
    --config src/anomalib/models/patchcore/config.yaml \
    --weights results/patchcore/mvtec/bottle/run/openvino/model.xml \
    --metadata results/patchcore/mvtec/bottle/run/openvino/metadata.json \
    --input datasets/MVTec/bottle/test/broken_large/000.png \
    --output results/patchcore/mvtec/bottle/images

OS information

OS information:

Expected behavior

The generated IR models should load & infer as expected.

Screenshots

No response

Pip/GitHub

GitHub

What version/branch did you use?

master

Configuration YAML

dataset:
  name: mvtec
  format: mvtec
  path: ./datasets/MVTec
  task: segmentation
  category: bottle
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: 224 # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: true
  layers:
    - layer2
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: openvino # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 1
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

Traceback (most recent call last):
  File "tools/inference/openvino_inference.py", line 106, in <module>
    infer()
  File "tools/inference/openvino_inference.py", line 82, in infer
    inferencer = OpenVINOInferencer(path=args.weights, metadata_path=args.metadata, device=args.device)
  File "/home/jupyter/anomalib_git/anomalib/src/anomalib/deploy/inferencers/openvino_inferencer.py", line 45, in __init__
    self.input_blob, self.output_blob, self.network = self.load_model(path)
  File "/home/jupyter/anomalib_git/anomalib/src/anomalib/deploy/inferencers/openvino_inferencer.py", line 80, in load_model
    executable_network = ie_core.load_network(network=network, device_name=self.device)
  File "ie_api.pyx", line 413, in openvino.inference_engine.ie_api.IECore.load_network
  File "ie_api.pyx", line 457, in openvino.inference_engine.ie_api.IECore.load_network
RuntimeError: std::bad_alloc

Code of Conduct

blaz-r commented 1 year ago

Hello. This is a know problem #967, sadly without a proper solution at this moment.

blaz-r commented 1 year ago

@bansi-maddali there is a workaround presented in #967 now if you still haven't solved this.

samet-akcay commented 1 year ago

Closing this due to duplication. You could follow #967 for updated. Thanks!