openvinotoolkit / anomalib

An anomaly detection library comprising state-of-the-art algorithms and features such as experiment management, hyper-parameter optimization, and edge inference.
https://anomalib.readthedocs.io/en/latest/
Apache License 2.0
3.83k stars 679 forks source link

ONNX inference and TENSORRT optimisation #600

Closed shrinand1996 closed 2 years ago

shrinand1996 commented 2 years ago

I would like to run Padim AD on a jetson nano on a live feed. When i use torch_inference, it works but its too slow (4s per image for prediction). After analysis, i saw that the GPU was not being used at all. To give enough initialization time, I even ran the script for 20 times. (it reduced the inference time from 25 sec to 4 sec). I would like to know how to perform fast inference on jetson nano using the scripts provided or do I need to do something else to activate the GPU?

ashwinvaidya17 commented 2 years ago

@shrinand1996 Thanks for reporting this bug. The quickest workaround your can use for now is to move the model to GPU after this line https://github.com/openvinotoolkit/anomalib/blob/main/anomalib/deploy/inferencers/torch_inferencer.py#L86 and move the image to gpu before passing it to the model https://github.com/openvinotoolkit/anomalib/blob/main/anomalib/deploy/inferencers/torch_inferencer.py#L119

shrinand1996 commented 2 years ago

@ashwinvaidya17 how to do that?

ashwinvaidya17 commented 2 years ago

@shrinand1996 can you try this

def load_model(self, path: Union[str, Path]) -> AnomalyModule:
        """Load the PyTorch model.
        Args:
            path (Union[str, Path]): Path to model ckpt file.
        Returns:
            (AnomalyModule): PyTorch Lightning model.
        """
        model = get_model(self.config)
        model.load_state_dict(torch.load(path)["state_dict"])
        model.eval()

        return model.to("cuda")

def forward(self, image: Tensor) -> Tensor:
        """Forward-Pass input tensor to the model.
        Args:
            image (Tensor): Input tensor.
        Returns:
            Tensor: Output predictions.
        """
        return self.model(image.to("cuda"))
samet-akcay commented 2 years ago

@ashwinvaidya17, maybe we could consider adding device to TorchInferencer?

shrinand1996 commented 2 years ago

Thank you for your reply. I tried this and it run without errors. But, the inference became much slower. If i didnt specify cuda, it was around 3 sec per image. But with cuda, it is taking a whooping 130s. After a few iterations, the inference time is between 30s and 4 sec. I saw the jtop output and noticed that GPU was not being used at all. i checked if cuda was available and it was. If it is not, the code gives errors. What could be causing this?

ashwinvaidya17 commented 2 years ago

@shrinand1996 Did you try the linked PR and if so can you share the commands you used to run the model and the model configuration file? Also, can you paste the environment details such as CUDA and pytorch version. I tried it on my end and when saving predictions for the bottle dataset, I am getting 4.7s total inference time for cuda and 4.9s on cpu.

shrinand1996 commented 2 years ago

Hardware Software Config

Pytorch Version: 1.12.0a0+git67ece03
OS: Ubuntu 20.04
Hardware: Jetson Nano
CUDA version: 10.2.300
cuDNN: 8.2.1.32

Code:

import warnings
import skimage
from argparse import ArgumentParser, Namespace
from pathlib import Path

from anomalib.data.utils import (
   generate_output_image_filename,
   get_image_filenames,
   read_image,
)
from anomalib.deploy import TorchInferencer
from anomalib.post_processing import Visualizer
import time
config_path= "custom/config.yaml"
weights='custom/model.ckpt'
save_path='test_results'
inferencer = TorchInferencer(config=config_path, model_source=weights)
visualizer = Visualizer(mode="simple", task="segmentation")

filenames = get_image_filenames(path="custom/images")
for filename in filenames:
       image = read_image(filename)
       start=time.time()
       predictions = inferencer.predict(image=image)
       end= time.time()
       output = visualizer.visualize_image(predictions)
       print(end-start)
       if save_path:
           file_path = generate_output_image_filename(input_path=filename, output_path=save_path)
           visualizer.save(file_path=file_path, image=output)

Config File:

dataset:
  name: NMF
  format: folder
  path: /content/drive/MyDrive/NMF/2/
  normal_dir: good # name of the folder containing normal images.
  abnormal_dir: bad # name of the folder containing abnormal images.
  normal_test_dir: good # name of the folder containing normal test images.
  task: classification # classification or segmentation
  mask: 
  extensions: null
  split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
  image_size: 512
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 8
  transform_config:
    train: null
    val: null
  create_validation_set: true
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16
model:
  name: padim
  backbone: resnet18
  pre_trained: true
  layers:
    - layer1
    - layer2
    - layer3
  normalization_method: min_max # options: [none, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    image_default: 3
    pixel_default: 3
    adaptive: true

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null #options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  accumulate_grad_batches: 1
  amp_backend: native
  auto_lr_find: false
  auto_scale_batch_size: false
  auto_select_gpus: false
  benchmark: false
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  default_root_dir: null
  detect_anomaly: false
  deterministic: false
  devices: 1
  enable_checkpointing: true
  enable_model_summary: true
  enable_progress_bar: true
  fast_dev_run: false
  gpus: null # Set automatically
  gradient_clip_val: 0
  ipus: null
  limit_predict_batches: 1.0
  limit_test_batches: 1.0
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  log_every_n_steps: 50
  max_epochs: 1
  max_steps: -1
  max_time: null
  min_epochs: null
  min_steps: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle
  num_nodes: 1
  num_processes: null
  num_sanity_val_steps: 0
  overfit_batches: 0.0
  plugins: null
  precision: 32
  profiler: null
  reload_dataloaders_every_n_epochs: 0
  replace_sampler_ddp: true
  sync_batchnorm: false
  tpu_cores: null
  track_grad_norm: -1
  val_check_interval: 1.0 # Don't validate before extracting features.

I did not use the PR. I made the changes that you suggested on the previous comments.

ashwinvaidya17 commented 2 years ago

@shrinand1996 What's your image dimensions?

shrinand1996 commented 2 years ago

I just uploaded the config file as well. The image size is 512

shrinand1996 commented 2 years ago

Hello, this works on colab but i dont see a major difference in cpu vs gpu computing. Since Padim by nature is big,i guess jetson nano gpu will not be sufficient for it. But why is padim so slow? Even on a good gpu, 2 fps is quite low.

Which anomaly detection model is best for real time detection? (>25 fps)?

ghost commented 2 years ago

@shrinand1996 FastFlow with ResNet18 backbone is at 30.8 FPS according to their paper. But it is slow as well with the anomalib implementation somehow. I tested it on a 2070 super and a 3060. I try to use it for real time anomaly detection too.

SimonB97 commented 1 year ago

@shrinand1996 hi, i'm trying to do the same, predict on a jetson nano.

would you mind telling me how you even got cuda to work with anomalib on the jetson? it would be very helpful if you could tell me which python version, pytorch/pytorch-lightning+gpu wheel/version, linux and anomalib version you used!

Thank you in advance!