Continuing the training of MegaDetector

vlas-sokolov commented 3 years ago

Hi all! That's a lot for making this fantastic work available, the collective usefulness of the tools provided is truly staggering.

I was trying out MegaDetector and it's working really well. However, it consistently misses some smaller animals that are partially obscured in the grass (so, false negatives), and sometimes has a tendency to pick up inanimate objects like tree stumps as animals (false positives). Confidence thresholding doesn't help to weed out those false positives as it comes at a high price - basically, I'd have to set a threshold close to unity and miss on a lot of robust animal detection that way.

I see that there are checkpoints available for the 4.1 detector version, and was wondering if there's a code snippet one can use for continuing the training?

Alternatively (a bit of a general question, sorry), are they any good strategies to mitigate the issues above?

agentmorris commented 3 years ago

Thanks for your interest in MegaDetector.... we don't have a code snippet for resuming training, unfortunately, but I'm not sure I would recommend that anyway. MegaDetector v4 is based on TensorFlow 1.x, which is starting to get pretty out of date, and you'd probably have to do a bunch of work to manage an outdated training environment. We will be releasing a new version later this year that will (with very high probability) move to TF 2.x.

Regarding the false-negative/false-positive balance... a couple things I would add:

1) There definitely exist ecosystems and species that just give MDv4 a hard time. These are exactly what we're trying to address with MDv5! In fact one of our MDv5 training data sets is basically "tiny squirrels hiding in tall grass". :) But for now, there's always a possibility that you simply have animals MD can't reliably find, and MD isn't the solution for you.

2) That said, I would start by seeing whether there are lower-confidence detections on those animals. If you're getting those detections at 50% confidence, that may be fine, which brings us to...

3) Regarding false positives... first, we encourage users to think of MD as a way to accelerate image review, not to automate, so you will bang your head against the problem if you aim for 0% false positives. We typically try to get to a level of false positives that will still save our users more time than it takes to add a new step into their workflow, often this is somewhere around 80% precision, as long as we can get very high recall at that level of precision. So if you're somewhere around there, I'm not sure I would spend too much time trying to get rid of every false positive.

4) That said, this very inelegant technique has been magical for repeated false positives due to sticks, rocks, etc.:

https://github.com/microsoft/CameraTraps/tree/master/api/batch_processing/postprocessing/repeat_detection_elimination

It takes a little practice, but once you get the hang of it, you can get rid of hundreds of thousands of false positives in just a few minutes of semi-automated image review.

Hope that helps! If you find that you're just in one of those situations where MDv4 can't reliably find your animals, stay tuned for MDv5, and report back here!

Thanks.

-Dan

vlas-sokolov commented 3 years ago

Hi Dan, thanks a lot for your reply! Will be looking forward for a new detector version. On point 3: I'm aware of the precision/recall tradeoff and wasn't expecting a perfect detector - I was simply wondering what are the available tools, tips, and tricks to leverage that balance. Your reply sums that up nicely.

In the meanwhile, the postprocessing script you've linked too looks absolutely fantastic. It looks like something I was afraid I'd have to write myself. Again, thanks a lot for making these available! :1st_place_medal:

I'm going to close the issue now.

vlas-sokolov commented 3 years ago

Just wanted to chime in with an update in case anyone else stumbles upon this. The script tools linked above worked like a charm with a few tricks:

Had to modify IOU threshold defaults in repeat_detections_core.py, because remove_repeat_detections f-ion doesn't allow for options to be passed from command-line driver.
The defaults for find_repeat_detections are quite sensitive to the project setup, and esp. occurrenceThreshold value depends heavily on how many images from a given location exist.

Other than this, it was a huge time saver to weed out the false positives!

agentmorris commented 3 years ago

Glad it worked out! Yes, it takes a little fine-tuning and intuition, but it's magical once you get it going. We typically use an occurrenceThreshold of ~10 (below that it isn't really meaningful), but for very large projects (millions of images, with tens of thousands per camera) where we don't really want to spend a long time on the repeat detection process, we'll bump that up to as much as 50.

microsoft / CameraTraps

Continuing the training of MegaDetector #242