can i train Yolo with an imageset larger than 1920x1080

pjreddie / darknet

Convolutional Neural Networks

http://pjreddie.com/darknet/

Other

25.69k stars 21.33k forks source link

can i train Yolo with an imageset larger than 1920x1080 #368

Open mkhegazy opened 6 years ago

mkhegazy commented 6 years ago

Hello, i am fairly new to yolo i am trying to use it and i wanted to ask a question. i am in the process of bounding boxes using YOLO_Mark. i have a set of images (around 200 image) i have are taken from my phone so they are usually around 4032x3024 . so the question is, should i convert these images to lower resolution (1000x1000 or even lower) or i can use it normally 4032x3024 and wont be an issue

baristahell commented 6 years ago

There are several issues : First, you have to compute your proposed anchors to fit the larger final grid and input resolution so all of your objetcs are still visible by the network. Then, will you have enough memory to store the corresponding net weights and activations while training and testing? I'd say, if you can find a way to cut them in 4 without impeding your goal (boxes cut in half and annoyances like that), definitely go for it.

mkhegazy commented 6 years ago

ok. since i am having the issue with the memory size in the first place then i have no option rather than decreasing the image dimensions. stick to 800x800 ?

baristahell commented 6 years ago

Be careful when you resize, to not crush your small objects. I'm using 1920x1080 images that are resized to 832x416 to get a 26x13 grid, i guess 832x832 (so a 26x26 grid) would be good for you.

mkhegazy commented 6 years ago

ok great thank you very much . i started working on the resize. (around 200 images for a start :D) when i am done i will let you know how did it go . thank you again

lqian commented 6 years ago

@baristahell A interesting quest about grid number, as you mention, 832x416 get 26x13 grid. The YOLO net configs 416x416 as input. Its config file does not have any parameter about the grid number. Does the YOLO net computes the grid number automatically?

baristahell commented 6 years ago

Yes, the grid number is "set" according to your input size. That is because of the successive conv/pooling, reducing the feature map size to a ratio of 1/32th of the input. It's also why, when computing your boxes centroids for the anchors in the region layer, you don't keep them as a fraction of the original image size but rather as a their size on the cells grid.

mkhegazy commented 6 years ago

i used the Application called Yolo_mark. interestingly it changed the size from 4094x3024 to 1000x800

MaxK94 commented 6 years ago

@baristahell Can u give more info about compute anchor boxes? Have some troubles with understanding this:(

kmsravindra commented 6 years ago

@baristahell, about this comment "Be careful when you resize, to not crush your small objects. I'm using 1920x1080 images that are resized to 832x416 to get a 26x13 grid, i guess 832x832 (so a 26x26 grid) would be good for you."

When you resize from 1920 x 1080 to 832 x 416, the aspect ratio got changed? Were the annotations done after the resize? Just wondering if you used the network to predict on 832 x 416 itself? OR on 1920 x 1080? Thanks!

matanhs commented 5 years ago

Hi, Im also trying to increase the image size for tiny yolo-v2 on pascal dataset. I initialized parameters from the 416x416 checkpoint and recomputed the anchors appropriately (new input size is arbitrarily set to 960x960). However, the quality of the model seems to be underwhelming. Anyone cares to share your large image model accuracy?

jorgegaticav commented 3 years ago

There are several issues : First, you have to compute your proposed anchors to fit the larger final grid and input resolution so all of your objetcs are still visible by the network. Then, will you have enough memory to store the corresponding net weights and activations while training and testing? I'd say, if you can find a way to cut them in 4 without impeding your goal (boxes cut in half and annoyances like that), definitely go for it.

Do you have more info about how to compute the anchors? I'm using 2048x2048 res images.