Meta-issue: list of open issues, random todo's, and half-baked ideas

agentmorris commented 1 year ago

Sometimes folks ping us to ask how they can contribute code to the MegaDetector project, and we don't really have a place to point them right now. Combined with the fact that a couple of important open issues have been languishing for a few weeks (months?), I got motivated to create this issue as a snapshot of our internal todo list, so I have somewhere to point folks who want to get involved. I'm making only a weak attempt at prioritization here, instead I'm just trying to sort them into logical buckets.

If you're interested in trying your hand at any of these, email us!

Feature additions for existing scripts/tools

Occasionally users have high-resolution images, and MD is missing some small animals after resizing to the standard input size (1280px). Two things to try here: (a) running the model at the native image size (larger than what MD was trained on), and (b) chop the image up into 4 smaller images, run MD on each, and recombine the results. We have code for splitting images into smaller images here, and we've used that to manually run MD on the new folder of smaller images, but we've never glued it all together. So, the task here is to glue that all together as an option for run_detector_batch, and the stretch goal is to try that on some tough images will small animals and see how it compares to just running the model on a larger image size.
Allow run_detector_batch.py to use multiple GPUs. If it's easiest to do this by wrapping this in a call that just splits the input in two and recursively calls run_detector_batch.py (rather than maintaining a properly balanced queue), that's totally fine. (Obsolete if someone instead wraps the native YOLOv5 inference scripts to replace run_detector_batch entirely.)
Add proper checkpointing support when running run_detector_batch.py on multiple cores, or multiple GPUs. Currently checkpointing is supported only when running on a single thread.
During the repeat detection elimination process, we currently render a single image per unique detection. For reasons that are really hard to explain unless you've been through this process a few times, it would be really useful to basically make this image twice as wide, with the right half of the image being exactly what we render now, and the left half being a collection of crops representing the top 30-ish images that match this detection. Basically "write a function that takes a 'primary' image, a list of other images, and a list with one bounding box for each of those other images, crop all those bounding boxes out, and tack it on to one side of the 'primary' image.
Also related to the repeat detection elimination process... currently if you run the "find repeat detections" portion of the pipeline, and decide you just want a different threshold for the number of repeat detections you want to use to call a detection "suspicious", you have to run the whole process again. This is silly; you should be able to just change the threshold. In fact, even better, you should be able to specify around the number of suspicious detections you feel like dealing with (typically around 1000, which is around 5-10 minutes of manual review), and have that threshold determined automatically.
Allow postprocess_batch_results.py to operate on sequences, rather than just images. Sample based on sequences, do precision/recall analysis based on sequences, and render sequences in a sensible way on the output page.
Allow postprocess_batch_results.py and compare_batch_results.py to sort output by confidence (so it's easier to scan for misses).
Allow postprocess_batch_results.py to use different confidence thresholds for different categories
In repeat detection elimination and sequence-based classification smoothing, write the smoothing parameters into the output file.
The json manager app currently hard-codes the expected structure, so we have to keep it up to date with minor additions to the .json format. This should only hard-code parameters it actually needs to operate on, and pass everything else through unmodified. Json.NET supports all the right things, we're just not doing those things right now.
We have a halfway-finished script to convert MD results to COCO format (to train a downstream detector); it would be nice to finish this, and allow allow output to YOLO format (e.g. using our mostly-finished COLO to YOLO script.

Refactoring or re-writing stuff

postprocess_batch_results.py is an absurd use of Pandas right now, and has an absurd level of duplication between the code paths (with/without ground truth, with/without classification results). This badly needs a re-write from scratch.
repeat_detections_core.py isn't nearly as bad, but it's not ideal, and it has some really bizarre properties right now, like the fact that when you run the main function a second time to apply a set of changes after the manual review step, it repeats all the folder-separation stuff it did the first time, which is brittle and silly. Not quite a total re-write, but a significant cleanup.
The sequence-based classification smoothing process is a useful and relatively standalone piece of code that is currently buried in the notebook I use to run MegaDetector. This should be refactored into its own script, and possibly updated to operate on detection data too.
merge_detections.py - which is useful when you want the "greatest hits" of MDv5a and/or MDv5b and/or MDv4 and/or some other detector someone else trained - currently only looks at whole images, it has no way of saying "MDv5a found one animal in this image that MDv5b missed, even though MDv5b found a bunch of animals, so just merge that one in". Update this to look at individual detections.

Infrastructure things

A substantial number (most?) of our users prefer R, and we're forcing them to run a bunch of Python code. It would be great to either wrap the inference process in R, or port the inference code to R. IMO it's not urgent to do this for anything other than the inference code (run_detector_batch.py) and maybe separate_detections_into_folders.py.
This repo relies on run_detector_batch.py for inference, but YOLOv5 has its own inference scripts with lots of bells and whistles: standardized methods for precision/recall analysis, export to YOLO and COCO formats, and - especially - fancy test-time augmentation tools. Plus a zillion people have worked on making YOLOv5's inference tools deployable in a zillion different environments. If we forget about MDv4 support for a bit, it would be really nice to see a streamlined path to using YOLOv5's detect.py and/or val.py scripts, including (a) making sure that we get the same results as run_detector_batch, (b) converting the output to the MD output format (including adding 1 to the class indices), and (c) experimenting with YOLOv5's --augment parameter (I'm secretly hoping there's a little more accuracy we can get "for free" using --augment).
As of the time I'm writing this, "megadetector" is still available at PyPI, and we've cleaned up our dependencies pretty well now, so it's silly that users can't just do "pip install megadetector". We should get on that.
There is a whole cluster of related issues that basically relate to our preferred version of PyTorch approaching obsolescence. This includes M1 support and updates to the inference code. The right path forward is almost certainly not to incrementally handle every imaginable code path or to make all of these changes independently, rather I think it's time to use a combination of the approaches suggested by @persts and @sim-kelly on issues 297 and 312 to create an official MDv5.01, thoroughly test it on all the platforms we support, and thoroughly test it on a number of datasets to quantify the impact on results, which are almost certainly negligible, but not zero. After that we can update our dependencies to current versions of PyTorch and YOLOv5.
We currently run inference on a single image at a time; there may be a performance benefit to using PyTorch's batch inference functionality, which would require significant code changes. It would be a self-contained task to evaluate this speedup and make sure the results are the same, then we can decide whether to make more significant changes to run_detector_batch.py. (Obsolete if someone instead wraps the native YOLOv5 inference scripts to replace run_detector_batch entirely.)

Miscellaneous things that are more exploratory

This issue - which causes MD to do wild stuff on certain seasons of Snapshot Serengeti but not when the images are run at reduced resolution, a phenomenon we've never observed on any other data - continues to cause some angst, and I'd like to (a) continue to research the cause and (b) write some code to detect this issue automatically. We have MDv4/MDv5a/MDv5b results available for all of Snapshot Serengeti at both resolutions, and someone needs to do some exploratory analysis to see whether there's anything in particular (camera model? image size? image brightness?) that causes this, and/or write code that compared the full-resolution and reduce-resolution results and looks for bizarre things.
We have seen cases where someone is looking for Very Small Things in their images, and they're too small for MD to find with adequate recall, but if we break the images up into tiles that are 1280px wide, now we can see the Very Small Things. It would be nice to be able to do this seamlessly, including stitching the detection results together at the end.
We have a number of cases where MegaDetector works almost well enough to use, which unfortunately is the same as "not at all useful". Typically these are cases that are right at the boundaries of species or physical camera setups that we've seen in training: all-reptile datasets, cameras inside caves full of bats, etc. But many of those cases are likely within reach of some fine-tuning, and we just haven't stitched together a complete end-to-end tutorial for doing this, or guidelines for where it's likely to be useful and where it would be just better to start from scratch. I'm not even sure this project would need new code, we just haven't really taken this end to end for a real use case. This excellent Kaggle tutorial is a great start, and AIDE does a lot of the things I'm suggesting here, we just haven't really proved this process out for a camera trap use case.
Along the same lines, some folks use MegaDetector to generate synthetic training data for a smaller detector on ecosystem-specific, unlabeled training data, typically to run on embedded devices. We could use a nice tutorial for doing this too, and I'm about 90% sure the right downstream model is just a smaller YOLOv5 (or now YOLOv8) model.

Other projects that could use your help

If you found this text because you want to work on open-source code related to conservation, and everything I just listed is either too boring or too daunting, please don't give up! Depending on your specific skill set, maybe our close collaborators who maintain EcoAssist, Timelapse, or any of the platforms listed here could use contributions. Or head over to the "Open Source Solutions" forum at WILDLABS, and offer your skills there!

Random models someone should train

Now I'm letting this thread really veer off into a tangent, but FWIW, people frequently ask us "can MegaDetector do [x]?", where [x] is something MegaDetector definitely can't do. But there are some values of [x] that have come up a bunch of times and feel like the right balance of "tractable" and "useful", where there's sort of the right training data in the universe, and a focused student project could really get something going. So, to finish up this long post with lots of random ideas:

Not exactly model training, but... we often get requests to further categorize vehicle types (e.g. into car/motorcycle/bike/boat/etc.), and it just dawned on me that both COCO and ImageNet have a number of vehicle categories, and that a zillion models exist that have been pretrained on COCO and ImageNet, so we could plausibly get this "for free" with just some glue. I would start with the YOLOv8 COCO-trained detector and ImageNet-trained classifier, run either on whole images where MD predicted "vehicle" or on crops of those vehicles.
A model to classify camera trap images as obscured due to fog or snow
A model that runs as part of postprocess_batch_results.py to pick out "fun" images (currently we do this manually from the output of postprocess_batch_results, which is fast, but it means we're only ever searching over the ~7500 images we sample for postprocessing)
"MegaDetector for snakes"
"MegaDetector for fish"
Lots of camera trap data was recently ingested into Hugging Face, with the hope that someone might train a super-giant species classifier for camera trap data, and/or document a nice process for training regional classifiers. AFAIK no one has done either of the above yet.

patelvyom commented 1 year ago

Another to-do might be to rewrite batch detection scripts to use PyTorch Dataloader instead of managing image I/O manually. This will also allow performing batch inference instead of looping over each image one by one. It should significantly improve inference performance.

agentmorris commented 1 year ago

Updating my response to this suggestion: rather than investing time in using the PyTorch data loader, I'd like to see someone experiment with YOLOv5's native inference tools (val.py and detect.py) as a total replacement for our inference scripts. These have all the benefits of "proper" PyTorch data loading, but also have a zillion bells and whistles, especially test-time augmentation that could improve accuracy.

--

That's a great suggestion, I'll add an item to the list... more specifically, though, the item is to do a performance test (which can be arbitrarily inelegant) to see what the benefit would be, with and without a GPU, and make sure results are identical. If the benefit is more than around a 25% speedup, it's probably worth it. If it's less than that, it may be preferable to keep the current approach, which is easier to debug and maintain, and keeps a much longer shared code path across PyTorch and TF. Also I vaguely remember that images in a batch need to be the same size, which isn't guaranteed, so either the test would need to verify that this isn't the case, or the implementation would need to break batches when the image size changes.

Nidhi703 commented 10 months ago

Hey I found the topic very interesting and useful and would like to contribute if allowed or atleast give it a proper try but this would be my first open source contribution and I would really need some guidance so is there someone I could talk to about it and maybe work on certain easy tasks to get better at things?

zhmiao commented 8 months ago

Hello @Nidhi703, we are very sorry for the late reply. We totally missed your reply. Would you like to let us know which part of the list you want to contribute to? Thank you very much!

arky commented 8 months ago

@agentmorris @zhmiao Perhaps there is value taking some of these ideas and filing them as individual issues. I believe that would provide good contributor pathways for new community people to join the project.

@Nidhi703 Please consider joining the discord channel if you haven't already!

microsoft / CameraTraps