Question? - Githubissues

mrlooi / rotated_maskrcnn

Rotated Mask R-CNN: From Bounding Boxes to Rotated Bounding Boxes

MIT License

350 stars 62 forks source link

Question? #8

Closed WeihongM closed 5 years ago

WeihongM commented 5 years ago

❓ Questions and Help

Hello, thanks for your project sharing. I have read into your py files, however, I am confused about the parameter make_width_larger, in the anchor_generator.py, Can you explain why this matters in the anchor generation process. Thanks.

WeihongM commented 5 years ago

@mrlooi

mrlooi commented 5 years ago

Hey, I believe I set that parameter to False (see modeling/rrpn/utils.py), so it doesn't matter much.

Originally, I wrote it because I wanted consistency with rotated anchors, whereby the width is always >= height. But then later I realized I didn't really need it.

WeihongM commented 5 years ago

@mrlooi Thanks for your reply! In the paper RRPN, I found the angle range is -pi/6 and 2/3pi, which is different from your project here -pi/4 and pi/4? Can you explain the reason why you have such a setting? What’s more, I find the angle setting satisfy theta=pi/2+theta when the detected object is nearly width = height, while text is maybe long which may not satisfy, I think maybe this is the reason why you set angle range from -pi/4 and pi/4. Please point me out if I have a wrong understanding, thanks! Another question, If I want to predict text, such as ICDAR2015 dataset, Can I modify this easily?

mrlooi commented 5 years ago

I did it to prevent angle ambiguity for objects where the height and width are similar. Imagine a bounding box where width is similar to height. When you introduce angles, this bounding box could have angle = 0 or angle = pi/2. Both angles can make a valid rotated box, but the problem is that if this is not constrained, the model might converge to output the mean angle, which is pi/4 (wrong). Hence the term angle ambiguity.

By restricting the range to pi/2 (-pi/4 to pi/4), I reduce the possibility of overlaps down to the -pi/4 and pi/4 tail ends, and enforce better consistency for angle regression.

Of course there are other ways to do this, but I find that this approach can generalize better than other RRPN-based papers, which were made to work on OCR/remote-sensing applications (where objects like text almost always have long width/height ratios).

mrlooi commented 5 years ago

If you want to train on other datasets, have a look at "Training Your Own Dataset" in README. In fact I have included a ICDAR 15 example

WeihongM commented 5 years ago

@mrlooi Hello, thanks for your reply. I have visualized the generating anchors. Imagined if we want to detected a text which have a large width/height, and its angle is -2pi/3, if we use angle range from -pi/4, pi/4, can we get the positive anchors? (GT have iou>thresh with generating anchors.)

mrlooi commented 5 years ago

Make sure the ASPECT_RATIOS are equally distributed e.g. 0.25, 0.5, 1.0, 2.0, 4.0.

E.g. If we have (theta, w, h), aspect ratio of 4 would be (theta, 4, 1), and 0.25 would be (theta, 1, 4). This would give you the anchor you need in your example

WeihongM commented 5 years ago

Thanks for your reply. I find the problem. The default parameter make_width_larger in function is True. I set it False and solve the visulization.