Open youssefzaky opened 7 years ago
I think these codes are not necessary if you follow #23. You can remove all these default number. Just let it initialized to zeros:
_boxes = np.zeros([hw, num_anchors, 4], dtype=np.float)
_box_mask = np.zeros([hw, num_anchors, 1], dtype=np.float)
In my experiments(w/ or w/o these default values), the trained model mAP is almost the same.
I think these numbers come from:
delta_region_box
coord_scale * (2 - truth.w*truth.h)
Not really sure why YOLO's author did it. net.seen is the number of images seen by network in training process. 12800 is roughly 0.8 epoch (VOC07+12 has 16,651 images). Maybe it's kind of curriculum learning I guessed.
A question about the implementation of targets in this project. Why do the default box targets have values of 0.5 for x, y and 1 for width and height? And the box mask has default weights of 0.01? Does this provide incentive for predicted boxes that are unmatched to ground truth to simply predict prior boxes? Is this in the original implementation?