Inference of all three stages using GPU?

sergiomsilva / alpr-unconstrained

License Plate Detection and Recognition in Unconstrained Scenarios

Other

1.72k stars 607 forks source link

Inference of all three stages using GPU? #86

Open PhilipsKoshy opened 5 years ago

PhilipsKoshy commented 5 years ago

When you use with video, we'd like the entire inference (vehicle detection + License Plate detection + OCR) to be as fast as possible, so that we can process as many frames as possible. When I have frame, I'd like to do all the three inferences in GPU, before I start with the next frame. Otherwise, we lose all the gain of GPU acceleration, by moving the image back and forth between CPU and GPU. Unfortunately, these stages are dealing with stored files. Any thoughts?

fadi212 commented 5 years ago

Following. Hey have you worked on this or not so far. I am thinking about dong so but just can't get my head around how to start it and what changes do I have to make for this to work

PhilipsKoshy commented 5 years ago

For taking video input, I found the following good. https://github.com/sergiomsilva/alpr-unconstrained/issues/57#issuecomment-511706352

But, this repo expects images stored in files on disc, for every stage (VD, LPD, OCR). I guess that is the way original Darknet expects. So, I guess, we need to modify the Darknet code to accept image matrix in GPU FB rather than image file on Disc. Plus, we can also try more optimizations like pinned memory and DMA transfer to GPU FB etc. I'm not able to try these now. If you happen to try out, please share the details.

MingRongXi commented 3 years ago

Hi, PhilipsKoshy! My program runs slowly on both my CPU and GPU. It takes four seconds to process an image. My cpu is 6 cores and 6 processes, and the memory is 8GB. My GPU is 8GB of TeslaM10. How many frames per second can you process, and what computing environment are you in?

PhilipsKoshy commented 3 years ago

Hi, PhilipsKoshy! My program runs slowly on both my CPU and GPU. It takes four seconds to process an image. My cpu is 6 cores and 6 processes, and the memory is 8GB. My GPU is 8GB of TeslaM10. How many frames per second can you process, and what computing environment are you in?

Did you ensure that the GPU acceleration is actually happening? Did you check with nvidia-smi or any similar utility? Is the YOLO built to make use of the GPU?

MingRongXi commented 3 years ago

Hi, PhilipsKoshy! My program runs slowly on both my CPU and GPU. It takes four seconds to process an image. My cpu is 6 cores and 6 processes, and the memory is 8GB. My GPU is 8GB of TeslaM10. How many frames per second can you process, and what computing environment are you in?

Did you ensure that the GPU acceleration is actually happening? Did you check with nvidia-smi or any similar utility? Is the YOLO built to make use of the GPU?

Yes. I complied the darknet with cuda and gpu, so the stage of vehicle-detection and ocr are fast. But the wpod-net is slow although I ran it with tensorflow-gpu and keras-gpu. And on my machine, the speed of tensorflow-gpu and tensorflow is basically the same, both slow.

PhilipsKoshy commented 3 years ago

I worked on it a while back. So, from my memory... I avoided the saving of the image to file; instead I modified it to hand over the frame in the memory to the next stages. So eliminate the file I/O. If I am able to dig up my old work, I will post it here later.

MingRongXi commented 3 years ago

Oh, thank you very much! But on my machine, the biggest factor affecting speed is the process of wpod-net, not IO. Do you remember your FPS and computing environment?