pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.82k stars 21.33k forks source link

Training YOLO on high resolution images #1700

Open ronias opened 5 years ago

ronias commented 5 years ago

I want to use YOLO v3 to detect small objects (drones) in high resolution images of about 4K, the drones are about 30X30 pixels.

I noticed that darknet cfg file has image width and height, if I change them to 4k and start the train process then I immediately get an error (when the network configuration is loaded): ./src/cuda.c:36: check_error: Assertion `0' failed

If I use the default values of height = 416 and width = 416 , then I think that the image will be resized to those values before the first layer and a lot of details will be lost.

Is it possible to train YOLO with high resolution?

DeepNoob commented 5 years ago

My guess - the amount of data fed into the GPU is too large. As a first attempt (if I am correct in my guess) - I'd change the subdivisions to the same number as the batch size so that at any point only 1 image will be processed in the GPU. Then you can start reducing the number till it fails.

ronias commented 5 years ago

I tried different batch sizes and sub divisions like batch size = 1 and subdivisions = 1 or batch size = 2 and subdivisions = 2 or 1, but it doesn't work. The top resolution I managed to get is 1024X1024.

My GPU is Nvidia GTX 1080 TI with 11GB memory.

enesyuceyurt commented 5 years ago

This a very challenging task. It's nearly impossible to detect such small objects with a regular network. I would divide each pictures into 4 different pictures and use them in training that way. This way you will have 4 times the dataset also. Otherwise its really difficult for yolo to detect such small objects in that huge image. If each photo has a size of 4096x2160, new photos will be 1024x540. So you can create a 1024x544(width and height should be divisible by 32) input sized neural network which will suit you best. If you want to detect small objects even better, you can use yolov3-spp.

xzessmedia commented 5 years ago

in my case i used a huge bunch of different photos at different resolutions and yolo scaled everthing while training.. no problem!

it depends on your configuration and your hardware, if you have low hardware it can be quite difficult to train your model "effectively"

dang-qi commented 4 years ago

Have you managed to train it on high resolution images(even 1024x1024)? It doesn't seem work well.

mgupta70 commented 2 years ago

You will have to divide the large image into non-overlapping sub-images of smaller size (like 400 small images of 224x 224 from 1 big sheet of 4kx4k). Adjust their BBOX accordingly as per the crop. You can use Albumentations package to do this. Feed these small crops with respective BBOX to Yolo and train it. At testing stage, again split your test image in same way and run your model on individual image and stich the results together.


There is an answer by Alexey but I do not know how to use it. ** @Alexey mentioned , "After combining the images you just should call do_nms_sort(dets, num, meta.classes, nms) function." in issue #4528 [link: https://github.com/AlexeyAB/darknet/issues/4528]

CySlider commented 1 year ago

Hmm, but what if the object you want to detect is along a cut? Shouldn't they rather overlap a little?

mgupta70 commented 1 year ago

That's a good catch @CySlider , for this you can divide the objects in overlapping patches like 25% or 50% or whatever. You will need to modify the rule, so that if an object is at boundary and is getting split then, you don't keep its BBOX info if it is les than 90% (or your choice) in a crop. For detection, you can run your algo in sliding window fashion with overlap and then do non-max suppression to get rid of multiple BBoxes for the same object

ghost commented 1 year ago

We want to train a huge dataset of 4K images which have small targets scattered throughout.We have complete annotating these images and we got less accuracy with 4K images.So is there any way other than dividing the 4K images because that might lead to the loss of targets.