Closed nathanielrindlaub closed 1 year ago
It's been a while since I wrote that documentation, so I'm racking my memory to remember why we chose that.
From what I can remember, the goal was to provide the classifier with undistorted images whenever possible. The bounding boxes provided by MegaDetector range from ~20 to ~1000 pixels per side, with all possible aspect ratios. (For example, sometimes only a small sliver of an animal appears on the side of an image, and then the bounding box may have size 20x500 (width x height).) We found (anecdotally) that stretching very narrow boxes to become square resulted in lower performance.
We did not try adding zero-padding, but our reasoning was that image background surrounding the animal would probably be more helpful for the classifier than zero-padding.
We did not try adjusting the size of the bounding box.
I hope this helps!
I think Natty just signed up to do an A/B comparison on both approaches and write a paper about it! This would be a great topic for this event, where BTW, Natty, you + team should totally be presenting. :)
If I had to place a bet on what Natty will find when he does this experiment :), I think the right answer will be something like "square crop, but expand that square by a bunch to make sure there's some background visible".
I'll close this issue since Chris is the world's expert on that sentence, since he wrote it, but feel free to keep discussing. Great topic for the AI for Conservation slack too!
Thanks for the info @chrisyeh96! And Dan perhaps I will try to A/B test the two approaches... seems straight forward enough to evaluate.
I guess I'm a little surprised by both of your answers, however. I was under the impression that too much background lead to poor generalization? I recently came across a classifier that did what you're describing Dan (added a 10% buffer to all sides of the crop to intentionally include some background) and my initial thought was that that might actually be counter-productive?
Cool sounding workshop too! I'll take a look at the call for papers and see if we can't put something together and throw our hat in the ring :).
I would agree that "too much" background may lead to poor generalization (although I don't know that we've proven that per se). But the definition of "too much" is definitely up for debate. Maybe 75 pixels on each side is good (so 150 more vertical and 150 more horizontal pixels)? IMO the downside of using too many pixels is maybe less about losing generalization, and more about just losing pixels when you're classifying small animals.
In the crop images step of the classifier training workflow, the docs recommends:
I've seen other camera trap classifiers do the opposite, i.e., distorting the image to coerce the arbitrary aspect ratio of the crop into a square rather than adding padding or making the bounding box a bit taller/wider to get to the desired input size.
I'm just curious if anyone has played around with comparing the two approaches and why the docs recommend the
--square-crops
approach.