Closed shrinand1996 closed 2 years ago
@shrinand1996 Thanks for reporting this bug. The quickest workaround your can use for now is to move the model to GPU after this line https://github.com/openvinotoolkit/anomalib/blob/main/anomalib/deploy/inferencers/torch_inferencer.py#L86 and move the image to gpu before passing it to the model https://github.com/openvinotoolkit/anomalib/blob/main/anomalib/deploy/inferencers/torch_inferencer.py#L119
@ashwinvaidya17 how to do that?
@shrinand1996 can you try this
def load_model(self, path: Union[str, Path]) -> AnomalyModule:
"""Load the PyTorch model.
Args:
path (Union[str, Path]): Path to model ckpt file.
Returns:
(AnomalyModule): PyTorch Lightning model.
"""
model = get_model(self.config)
model.load_state_dict(torch.load(path)["state_dict"])
model.eval()
return model.to("cuda")
def forward(self, image: Tensor) -> Tensor:
"""Forward-Pass input tensor to the model.
Args:
image (Tensor): Input tensor.
Returns:
Tensor: Output predictions.
"""
return self.model(image.to("cuda"))
@ashwinvaidya17, maybe we could consider adding device
to TorchInferencer
?
Thank you for your reply. I tried this and it run without errors. But, the inference became much slower. If i didnt specify cuda, it was around 3 sec per image. But with cuda, it is taking a whooping 130s. After a few iterations, the inference time is between 30s and 4 sec. I saw the jtop output and noticed that GPU was not being used at all. i checked if cuda was available and it was. If it is not, the code gives errors. What could be causing this?
@shrinand1996 Did you try the linked PR and if so can you share the commands you used to run the model and the model configuration file? Also, can you paste the environment details such as CUDA and pytorch version. I tried it on my end and when saving predictions for the bottle dataset, I am getting 4.7s total inference time for cuda and 4.9s on cpu.
Pytorch Version: 1.12.0a0+git67ece03
OS: Ubuntu 20.04
Hardware: Jetson Nano
CUDA version: 10.2.300
cuDNN: 8.2.1.32
import warnings
import skimage
from argparse import ArgumentParser, Namespace
from pathlib import Path
from anomalib.data.utils import (
generate_output_image_filename,
get_image_filenames,
read_image,
)
from anomalib.deploy import TorchInferencer
from anomalib.post_processing import Visualizer
import time
config_path= "custom/config.yaml"
weights='custom/model.ckpt'
save_path='test_results'
inferencer = TorchInferencer(config=config_path, model_source=weights)
visualizer = Visualizer(mode="simple", task="segmentation")
filenames = get_image_filenames(path="custom/images")
for filename in filenames:
image = read_image(filename)
start=time.time()
predictions = inferencer.predict(image=image)
end= time.time()
output = visualizer.visualize_image(predictions)
print(end-start)
if save_path:
file_path = generate_output_image_filename(input_path=filename, output_path=save_path)
visualizer.save(file_path=file_path, image=output)
dataset:
name: NMF
format: folder
path: /content/drive/MyDrive/NMF/2/
normal_dir: good # name of the folder containing normal images.
abnormal_dir: bad # name of the folder containing abnormal images.
normal_test_dir: good # name of the folder containing normal test images.
task: classification # classification or segmentation
mask:
extensions: null
split_ratio: 0.2 # ratio of the normal images that will be used to create a test split
image_size: 512
train_batch_size: 32
test_batch_size: 32
num_workers: 8
transform_config:
train: null
val: null
create_validation_set: true
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16
model:
name: padim
backbone: resnet18
pre_trained: true
layers:
- layer1
- layer2
- layer3
normalization_method: min_max # options: [none, min_max, cdf]
metrics:
image:
- F1Score
- AUROC
pixel:
- F1Score
- AUROC
threshold:
image_default: 3
pixel_default: 3
adaptive: true
visualization:
show_images: False # show images on the screen
save_images: True # save images to the file system
log_images: True # log images to the available loggers (if any)
image_save_path: null # path to which images will be saved
mode: full # options: ["full", "simple"]
project:
seed: 42
path: ./results
logging:
logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
log_graph: false # Logs the model graph to respective logger.
optimization:
export_mode: null #options: onnx, openvino
# PL Trainer Args. Don't add extra parameter here.
trainer:
accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
accumulate_grad_batches: 1
amp_backend: native
auto_lr_find: false
auto_scale_batch_size: false
auto_select_gpus: false
benchmark: false
check_val_every_n_epoch: 1 # Don't validate before extracting features.
default_root_dir: null
detect_anomaly: false
deterministic: false
devices: 1
enable_checkpointing: true
enable_model_summary: true
enable_progress_bar: true
fast_dev_run: false
gpus: null # Set automatically
gradient_clip_val: 0
ipus: null
limit_predict_batches: 1.0
limit_test_batches: 1.0
limit_train_batches: 1.0
limit_val_batches: 1.0
log_every_n_steps: 50
max_epochs: 1
max_steps: -1
max_time: null
min_epochs: null
min_steps: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
num_nodes: 1
num_processes: null
num_sanity_val_steps: 0
overfit_batches: 0.0
plugins: null
precision: 32
profiler: null
reload_dataloaders_every_n_epochs: 0
replace_sampler_ddp: true
sync_batchnorm: false
tpu_cores: null
track_grad_norm: -1
val_check_interval: 1.0 # Don't validate before extracting features.
I did not use the PR. I made the changes that you suggested on the previous comments.
@shrinand1996 What's your image dimensions?
I just uploaded the config file as well. The image size is 512
Hello, this works on colab but i dont see a major difference in cpu vs gpu computing. Since Padim by nature is big,i guess jetson nano gpu will not be sufficient for it. But why is padim so slow? Even on a good gpu, 2 fps is quite low.
Which anomaly detection model is best for real time detection? (>25 fps)?
@shrinand1996 FastFlow with ResNet18 backbone is at 30.8 FPS according to their paper. But it is slow as well with the anomalib implementation somehow. I tested it on a 2070 super and a 3060. I try to use it for real time anomaly detection too.
@shrinand1996 hi, i'm trying to do the same, predict on a jetson nano.
would you mind telling me how you even got cuda to work with anomalib on the jetson? it would be very helpful if you could tell me which python version, pytorch/pytorch-lightning+gpu wheel/version, linux and anomalib version you used!
Thank you in advance!
I would like to run Padim AD on a jetson nano on a live feed. When i use torch_inference, it works but its too slow (4s per image for prediction). After analysis, i saw that the GPU was not being used at all. To give enough initialization time, I even ran the script for 20 times. (it reduced the inference time from 25 sec to 4 sec). I would like to know how to perform fast inference on jetson nano using the scripts provided or do I need to do something else to activate the GPU?