pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.55k stars 21.32k forks source link

feature pyramids #555

Open ghost opened 6 years ago

ghost commented 6 years ago

in yolo3, you use concept from feature pyramid networks, can you explain why 'feature pyramid' likes dssd, retinanet outperform 'feature pyramid' likes ssd for object detection (not segmentation)? thanks!

pjreddie commented 6 years ago

Feature pyramids use backwards connections from later layers to merge information with earlier layers for predicting detections at higher resolution. This allows them to use both strong semantic signals from later layers and fine grained spatial features from earlier layers. Pretty cool design!

ghost commented 6 years ago

i have question, with same clustering method on same dataset, but yolo3's anchors' scales are larger than yolo2's many times, anyway to make them have same scale?

pjreddie commented 6 years ago

So YOLOv2 I made some design choice errors, I made the anchor box size be relative to the feature size in the last layer. Since the network was downsampling by 32 this means it was relative to 32 pixels so an anchor of 9x9 was actually 288px x 288px.

In YOLOv3 anchor sizes are actual pixel values. this simplifies a lot of stuff and was only a little bit harder to implement

xiayq1 commented 5 years ago

Sry to bother u. In my training, I set random=0, using VOC data for training. Only use 10000 steps. The YOLO can detect the objects. But if I set random=1, also 10000 steps. The YOLO can't detect anything. But I am more confused about why using different input size to train the YOLO. What kind of benefits will the different size of input bring to the YOLO? Like if I use the fixed size input, I think it's much better than using multi dimension size.

computervisionlearner commented 4 years ago

So YOLOv2 I made some design choice errors, I made the anchor box size be relative to the feature size in the last layer. Since the network was downsampling by 32 this means it was relative to 32 pixels so an anchor of 9x9 was actually 288px x 288px.

In YOLOv3 anchor sizes are actual pixel values. this simplifies a lot of stuff and was only a little bit harder to implement Hi, I guess yolov3 anchor sizes is relative to cfg file, as 608*608, not original image size, right? During multi-scale training process (as),