techwingslab / yolov5-net

YOLOv5 object detection with C#, ML.NET, ONNX
MIT License
344 stars 104 forks source link

Performance suggestion #47

Open gillonba opened 2 years ago

gillonba commented 2 years ago

This is a great project, but unfortunately performance is pretty awful compared to running the same models in Python. It takes so long to prepare the image and to parse the results that I don't think it really even matters if you run the model on the GPU or not.

I think one thing that would improve things quite a bit would be to witch from Parallel to standard for loops on lines 102 and 106 of YoloScorer.cs.

In my (limited) testing I saw a nearly 50% speed boost making that change alone. The tiny amount of work done inside those loops hardly justifies the overhead of parallelization. MAYBE leaving the parallel for on the outside might make sense, I don't know. Unfortunately I'm really not familiar with ML.Net or YOLO but I'll see if there are any other easy ways to improve things. Otherwise it is a great project, very easy to use and good results

gaohuijue commented 2 years ago

I did some optimizations. You can see Yolov5Net-Faster.I use OpenCl and OpenCv to boost the computations.I'm a new hand on OpenCl and I'm not sure that the OpenCl code is the best way.(Sorry,My English is poor) Where is faster

ydslash2 commented 2 years ago

I did some optimizations. You can see Yolov5Net-Faster.I use OpenCl and OpenCv to boost the computations.I'm a new hand on OpenCl and I'm not sure that the OpenCl code is the best way.(Sorry,My English is poor) Where is faster

高慧觉?感觉像个高僧的名字,亲你的项目很好,处理的很快,可我不知道怎么调整label的数量,能教教我么,我现在只能训练80label的.onnx用,没法添加lable

gaohuijue commented 2 years ago

I did some optimizations. You can see Yolov5Net-Faster.I use OpenCl and OpenCv to boost the computations.I'm a new hand on OpenCl and I'm not sure that the OpenCl code is the best way.(Sorry,My English is poor) Where is faster

高慧觉?感觉像个高僧的名字,亲你的项目很好,处理的很快,可我不知道怎么调整label的数量,能教教我么,我现在只能训练80label的.onnx用,没法添加lable

我猜你可能是这里的问题 image

ydslash2 commented 2 years ago

改了上面两个地方,没有用啊亲,还是报错,你有没有qq加个吧?378234608

Jcrueger commented 2 years ago

I tried this out by changing the Parallel.for loops to regular for loops like this: for (int y = 0; y < bitmapData.Height; y++) { byte* row = (byte*)bitmapData.Scan0 + (y * bitmapData.Stride); for(int x = 0; x < bitmapData.Width; x++) { tensor[0, 0, y, x] = row[x * bytesPerPixel + 2] / 255.0F; // r tensor[0, 1, y, x] = row[x * bytesPerPixel + 1] / 255.0F; // g tensor[0, 2, y, x] = row[x * bytesPerPixel + 0] / 255.0F; // b }

It is alot slower this way for me, were you running the release when you tested the speed or running thru vshost when you tested this? or perhaps you did things differently?

gillonba commented 2 years ago

I work primarily on Linux, so it could have to do with different implementations of the .Net framework. I did run a test on a Windows PC and performance seemed much better, though I did not evaluate which method is best. Intuitively though, since there is a fair amount of overhead running Parallel.For, it would seem best to at least convert the inner one to a standard for. It seems unlikely that we will be seeing any CPUs in the near future that can support more than bitmapData.Height threads anyway