openvino-dev-samples / YOLOv7_OpenVINO_cpp-python

This repository will demostrate how to deploy a offical YOLOv7 pre-trained model with OpenVINO runtime api
60 stars 15 forks source link

getting setup #4

Closed bbartling closed 1 year ago

bbartling commented 1 year ago

Hi,

In the README is there a choice to do either Python or C++ on Ubuntu? Either do I need to do Python and the C++ step on Ubuntu?

image

Cool repo, thanks for making this...

openvino-dev-samples commented 1 year ago

Yes, If you'd like to try Python version, you can just set-up the env by "pip install". And for C++, you should to download archived library from the link I provide.

bbartling commented 1 year ago

Hi,

Only thing I am trying to figure out is how do I show the inference results? Am running this on Windows 10, no errors.

Using main script: $ py -3.7 main.py -m yolov7.onnx -i images/horse.jpg

Using the main_preprocessing script:

$py -3.7 main_preprocessing.py -m yolov7.onnx -i images/horse.jpg
Dump preprocessor: Input "images" (color BGR):
    User's input tensor: {1,640,640,3}, [N,H,W,C], f32
    Model's expected tensor: {1,3,640,640}, [N,C,H,W], f32
    Pre-processing steps (2):
      convert color (RGB): ({1,640,640,3}, [N,H,W,C], f32, BGR) -> ({1,640,640,3}, [N,H,W,C], f32, RGB)
      scale (255,255,255): ({1,640,640,3}, [N,H,W,C], f32, RGB) -> ({1,640,640,3}, [N,H,W,C], f32, RGB)
    Implicit pre-processing steps (1):
      convert layout [N,C,H,W]: ({1,640,640,3}, [N,H,W,C], f32, RGB) -> ({1,3,640,640}, [N,C,H,W], f32, RGB)

Is there an imshow or save output to a different directory? Nothing pops up on my end...

Also curious to know what the difference is between the main and main_preprocessing is?

Is the Preprocessing API best for using video file or web cams? Thats my ultimate goal is tinkering around with some web cams in IoT setting with the Neural compute stick for people detection.

bbartling commented 1 year ago

This info above I think can be ignored...I have something to work with now. On my fork I made an IoT app to render the computer vision results in the browser with Flask on using a webcam and also incorporate a rest endpoint for a people count. I work a research firm where we are trying to find technology to count people inside buildings.

I dont have a ton of programming experience but any tips appreciated here....would you have any advice for if I would like to use a Neural Compute Stick 2, would I be better to use the Python main version or main_preprocessing approach?

Could one also incorporate the async features shown in the open vino notebooks? https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/115-async-api#asynchronous-inference-with-openvino

Thanks again for anytime you would have in response...

openvino-dev-samples commented 1 year ago

Thanks for your sharing on this update. I haven't try YOLOv7 on Neural Compute Stick 2 and I suggest you can run "main_preprocessing" with a GPU device (iGPU or dGPU), because it can you to offset the Preprocessing and Inference overhead from CPU to GPU.

async features will be a optimization step for video analytic, and here I just demonstrate a single image pipeline. Maybe I will update it with video frame input, then we can see how async api working with this sample.

bbartling commented 1 year ago

Hi,

For what its worth in my fork of your repo I have this utils.py that I grabbed from the openvino_notebook async demonstration on one of their pretrained models for people detection which contains a class for AsnycPipeline.

That would be awesome if the repo could be updated to include a video file/webcam feature....the pretrained models for people detection the FPS was real good but the overall accuracy I wasnt very impressed with. The Yolo models run real slow, only a half FPS but the accuracy is really good.

If you dont mind me asking why does the computer vision industry call it a "pipeline?" I am new to this field my background was PLC programming for operations technology (OT) so the acronyms and theory I am trying to pickup. Thanks for anytime you have in response.

openvino-dev-samples commented 1 year ago

Thanks for your advice, I will update it with async api support later. For your question on "pipeline", I think it means a whole AI workload could include many steps. For example : video decoding -> image preprocess ->inference -> results preprocess -> encoding or rendering.

bbartling commented 1 year ago

I think I am going to close this issue and will keep an eye on the async features....just out of curiosity have you ever retrained an existing pre-trained open model zoo model?

I am sort curious to retrain there people detectors with my own dataset but sort of stuck on how they generate the annotation.json file....if you had any advice its greatly appreciated!! Cheers- https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/openVINO-extensions-retrain-a-model/m-p/1430483#M28640

openvino-dev-samples commented 1 year ago

I think I am going to close this issue and will keep an eye on the async features....just out of curiosity have you ever retrained an existing pre-trained open model zoo model?

I am sort curious to retrain there people detectors with my own dataset but sort of stuck on how they generate the annotation.json file....if you had any advice its greatly appreciated!! Cheers- https://community.intel.com/t5/Intel-Distribution-of-OpenVINO/openVINO-extensions-retrain-a-model/m-p/1430483#M28640

For Python, we have new feature called "InferQueue" to trigger async inference in a simpler way. You can take this example as reference. https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/214-vision-paddle-classification/214-vision-paddle-classification.ipynb

bbartling commented 1 year ago

Thanks for the info....if you don't mind me asking could one ever create an async Python inference/web cam feed with Asncio that could work with any framework for computer vision?

I have been daydreaming about this concept psuedo code but I dont know enough if it would work in reality...any time you would have in response greatly appreciated, you seem quite wise in this sector : )

pseudo code idea for asnyc web cam feed 2 frames at a time and running model inference for the 2 frames:

import asyncio
import cv2

class FrameHolder:
    def __init__(self):
        self.frame1 = None
        self.frame2 = None

async def Inference_Machine():

    while True:

        if frame1 != None:

            # run inference
            model.detect(model.frame1)
            self.frame1 = None
            await asyncio.sleep(.01)

        else frame2 != None:

            # run inference
            model.detect(mode2.frame1)
            self.frame2 = None
            await asyncio.sleep(.01)

async def Video_Streamer():

        video_capture = cv2.VideoCapture(0)
        frame_counter = 1

        while True:

            ret, image = video_capture.read()

                if frame_counter == 1 and frames.frame1 != None:
                    frames.frame1 = image
                    frame_counter += 1
                    await asyncio.sleep(.01)

                 else frame_counter == 2 and frames.frame2 != None:
                     frames.frame2 = image
                    frame_counter -= 1
                    await asyncio.sleep(.01)

async def main():
    await(asyncio.gather(
        start_api_server(), 
        Inference_Machine()
            )
        )

if __name__ == "__main__":

    frames = FrameHolder()
    model = cv2.SomeModel()
    asyncio.run(main())
openvino-dev-samples commented 1 year ago

Hi @bbartling the code is updated, you can change the batch size to process more than 1 frame at same time. by "-bs" parameter.

bbartling commented 1 year ago

cool stuff ill submit a new issue for getting started on the webcam