Ways to speed up using iris-open on GPU

DimIsaev commented 1 month ago

I'm looking for ways to speed up inference from 0.5s -1s on frame on different processors to 50-100 ms

https://github.com/worldcoin/open-iris/blob/6b2fa096f7f196fc7e48d27bbb5e803c2b80e5bd/SEMSEG_MODEL_CARD.md#local-machine

here you have an example of measuring speed on ONNX GPU is there an example of how to change the use of ONNX on CPU to GPU?

installing the onnxruntime-gpu package does not solve the issue

DimIsaev commented 1 month ago

example

pip install onnxruntime-gpu

change line 79 https://github.com/worldcoin/open-iris/blob/6b2fa096f7f196fc7e48d27bbb5e803c2b80e5bd/src/iris/nodes/segmentation/onnx_multilabel_segmentation.py#L79

session=ort.InferenceSession(model_path, providers=["CPUExecutionProvider"]),

to

session=ort.InferenceSession(model_path, providers=["CUDAExecutionProvider"])

right ?

DimIsaev commented 1 month ago

speed up

speed increase 2 times no more from 1200ms to 540ms

EliteWeapon commented 1 month ago

I'm looking for ways to speed up inference from 0.5s -1s on frame on different processors to 50-100 ms

https://github.com/worldcoin/open-iris/blob/6b2fa096f7f196fc7e48d27bbb5e803c2b80e5bd/SEMSEG_MODEL_CARD.md#local-machine

here you have an example of measuring speed on ONNX GPU is there an example of how to change the use of ONNX on CPU to GPU?

installing the onnxruntime-gpu package does not solve the issue

The time performance reported at here only means the performance of 'Iris Semantic Segmentation Model', not the whole pipeline.

I did a test at my device (Geforce RTX 3080Ti, Intel I7-11700K, Windows 11) and the result is: CPUExecutionProvider: ~= 0.380s GPUExecutionProvider: ~= 0.018s The test codes just like the following:

class ONNXMultilabelSegmentation(MultilabelSemanticSegmentationInterface):

    ......

    def run(self, image: IRImage) -> SegmentationMap:
        """Perform semantic segmentation prediction on an image.

        Args:
            image (IRImage): Infrared image object.

        Returns:
            SegmentationMap: Postprocessed model predictions.
        """
        nn_input = self._preprocess(image.img_data)

        start_time = time.time()
        prediction = self._forward(nn_input)
        end_time = time.time()
        print(f'inference time: {end_time - start_time:.3f}s')

        return self._postprocess(prediction, original_image_resolution=(image.width, image.height))

DimIsaev commented 1 month ago

there is no reference in the test results to the parameters of the stand and the fact that this is only an inference of the model

My test stend i7-14700K / RTX4070super

how do I conduct tests and get results comparable to yours?

DimIsaev commented 1 month ago

Can I hope for an answer?

Maybe we should close the Issues section?

wiktorlazarski commented 1 month ago

@DimIsaev

Thank you for raising the issue. Yes, we are aware that running semantic segmentation model on the GPU improves time performance of the IRISPipeline call. We plan to introduce that in the future. In the first version of the open-iris we aimed for a simplicity of installation and usage of the package.

Regarding possible, further, speed up of the semantic segmentation model inference, you may have a look at ONNX's model optimisation and play around with modifying model's ONNX file directly. Here is a blog post on that https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html. I'm sure that careful examination of methods presented there will allow you to improve inference speed time. You should be able to find model checkpoint, after download in this directory https://github.com/worldcoin/open-iris/tree/dev/src/iris/nodes/segmentation/assets or it's available in our HF repo https://huggingface.co/Worldcoin/iris-semantic-segmentation/tree/main.

Best regards, Wiktor

worldcoin / open-iris

Ways to speed up using iris-open on GPU #44