Future optimizations - Githubissues

nathanielrindlaub commented 1 year ago

@rbavery had a few ideas for future optimizations of the Megadetector v5 endpoint that I wanted to document:

test compiling the model with NeuralMagic for increased inference speed
explore using test-time augmentations (during inference perform a few different random transforms/pre-processing steps on the fly, request inference on all versions of the image, and then average results across them) to boost model accuracy. This would come at the cost of potentially tripling (or more) our inference time depending on how many augmentations we try and under what conditions, so we'd want to think through it a bit more and be sure the benefits out weigh costs.
use ONNX-compiled models across all endpoints for the sake of standardization (and perhaps some speed gains)

@rbavery - anything else to add here??

nathanielrindlaub commented 1 year ago

From Dan:

Sometimes, if we're still missing animals, but one or both models look close, try again using YOLOv5's test-time augmentation tools via this alternative (but compatible) MD inference script.

nathanielrindlaub commented 10 months ago

Also according to Dan, "inference takes ~1.7x longer with TTA turned on". That's not as bad a hit as I was imagining, so very much worth evaluating.

rbavery commented 10 months ago

Just a heads up that I got try running MDV5a compile dwith tensorrt and it was blazing fast. Example here: https://github.com/rbavery/animal_detector/blob/master/mdv5app/torchscript_to_tensorrt.py

It sped up inference something like ~10x on my GPU compared to running the torchscript model without tensorrt.

This might be the most cost effective option for bulk inference without requiring a change in architecture. still uses torchserve and virtually the same handler code paths.

tnc-ca-geo / animl-ml

Future optimizations #112