microsoft / CameraTraps

PyTorch Wildlife: a Collaborative Deep Learning Framework for Conservation.
https://cameratraps.readthedocs.io/en/latest/
MIT License
770 stars 243 forks source link

[building a potential Edge AI solution] As an ecologist, what are you looking for to extract from camera traps? #176

Closed ejri closed 4 years ago

ejri commented 4 years ago

Hey all,

Not sure if it falls under an issue (I'm happy to post or x-post it somewhere more suitable). This repo may be the best place to ask some questions about what's needed to be integrated in camera traps, since individuals with technical backgrounds, with some understanding on research needs, are likely to stumble upon the repo.

I'm an AI for Earth grantee (agritech and their integration within smart cities). I learned about camera traps during the AI for Earth Summit, and coincidentally, as a part of an online competition (Microsoft's Discover AI Upskilling Journey) where organizers have asked participants to look into helping out with camera traps.

However, what is not particularly clear is, what exactly are ecologists looking to extract out of camera trap images? Animals (detection and classifying which animal)? Other objects related to poaching (people, cars, trucks)? Other features (weather conditions, time stamps of when detection happens)?

Understanding what needs to be extracted can help us (researchers and hobbyists) understand what needs to be developed, as well as tailor the hardware and integration to better fit what is needed.

Overview of my project:

The goal of this project is to highlight how effective Edge AI can be in eliminating/filtering out a significant number of inter-steps between data collection and obtaining classification and detection results, potentially along with other data/features that need to be collected or extracted.

Presentation1 Picture legend: green arrows -> working red arrows -> not working as intended blue arrows -> have not tried it yet int this project, but tested out and working in other projects

More information: The project aims to tie-in features from machine learning and internet of things with hardware devices, similar in some aspects to what is currently used on the ground, in order to reduce the overall time spent ecologists spend on classifications, collecting data, etc... As someone who has never interacted with ecologists, it's difficult to understand what is currently needed for them to better allocate their time and focus on research aspects, rather than countless hours on manual data labelling.

Model: I trained a custom object detection and classification (Yolov4) model for 17 classes, which are all the classes of animals that I was able to manually make out in the MegaDetector demo video. The classes are Armadillo, Bear, Bird, Bull, [Car], Cat, Cattle, Deer, Dog, Fox, Monkey, [Person], Pig, Raccoon, Sheep, Tiger, [Truck]. I added a few classes in [brackets], just in case someone would find the model useful as is.

The images used for training are obtained from Open Images Dataset V6.

Then, I used the model to run object detection and classification on some test images (the sheep image in the overview is one of the test images used) and the MegaDetector demo video.

Testing images (after ~5hrs training): Yolov4 TFLite (TensorFlow Lite model, converted from yolov4)

yolov4 TFLite

Yolov4

yolov4

Video: Running the yolov4 model on the MegaDetector demo video (less than 2 minutes for the model to detect and classify the video).

It seems to be pretty ok for some parts, needs more training for other parts. Some examples for detection/classification:

Screen Shot 2020-06-25 at 10 56 42 AM Screen Shot 2020-06-25 at 10 56 16 AM Screen Shot 2020-06-25 at 10 55 52 AM Screen Shot 2020-06-25 at 10 58 01 AM

Some examples for bad detection/classification (model needs more training?):

Screen Shot 2020-06-25 at 10 59 07 AM Screen Shot 2020-06-25 at 10 57 10 AM

Extracting more information from images

I was reading that ecologists also look at camera trap images to extract more information about the ecosystem, such as weather conditions.

My hand-waving approach was to use Custom Vision to train weather data. I stumbled upon a Multi-class Weather Dataset for Image Classification, which contains 4 classes of weather conditions. The dataset was used for training.

Screen Shot 2020-06-25 at 11 18 21 AM Screen Shot 2020-06-25 at 11 19 00 AM

However, testing the model on some images, the model does not seem to be the best for camera-traps. Even though some classifications were on point, the accuracy is really low.

Screen Shot 2020-06-25 at 11 19 22 AM Screen Shot 2020-06-25 at 11 25 14 AM

Another hand-waving approach to extract weather data from camera-trap images (haven't implemented this yet):

It seems that some images display temperature data in each image. Another approach is to extract temperature information as well as time/date stamps from images (OCR). Then use historical weather datasets (such as openweather API) to fetch other weather features such as humidity, wind direction/speed, weather conditions (rainy, cloudy, partially sunny, sunny), etc..

Raspberry pi as an Edge AI Device

Screen Shot 2020-06-22 at 5 38 48 PM

As implied earlier, the main reason to convert the Yolov4 model to a TensorFlow Lite model is for it to be run on the raspberry pi. This means that as an image or video is taken, object detection and classification happens on the device itself, where the results can be either uploaded directly to the internet or saved on an SD card.

As the weather information is relatively more challenging to extract directly from images and videos, integrating relatively cheap sensors may provide a more custom overview of the ecosystem. For example, it is possible to add sensors to collect data about emissions such as COx, NOx, NH3, VOCs as well as air quality. It would also allow for more customizability when it comes to choosing a camera that may be a better fit for custom applications, as well as detection sensors (in this example, pyroelectric/ infrared sensor in addition to an ultrasonic sensor).

agentmorris commented 4 years ago

Thanks for sharing! I will try to answer one of your questions here:

What exactly are ecologists looking to extract out of camera trap images?

It varies from project to project, but some common themes:

I think these are relatively representative of the larger camera trap community, but take what I say with a grain of salt, there are a number of biases in whom we end up in contact with.

Hope that helps!

-Dan

agentmorris commented 4 years ago

Closing because this is a discussion, not an issue; happy to keep discussing here, but preventing this from showing up as an open issue.

abfleishman commented 4 years ago

@ejri I think Dan is right in what many ecologists are looking to extract, currently but I think that is because of the amount of work that is needed to extract each piece of information manually. If there was a system to automatically classify weather I bet a bunch of researchers would want to use it! Often times camera trap surveys are limited by what people think they can handle in terms of data review. For example, I might choose to put out only 15-20 camera because it costs $1000 per camera to review the images and so my budget can only handle so many cameras, or I might choose to use motion-triggered events as opposed to timelapse settings because timelapse (1 frame every 1-2 second) generates WAY too many images to review manually. But if you can use AI to process the images in a cost-effective way, then I might choose to scale my survey 10x or 100x!

Another way that I have used camera traps to measure breeding success in birds. A camera is focused on a nest and takes timelapse photos and you can extract info such as the number of adults, eggs, or chicks, date of laying, date of hatching, date of egg or chick loss/death, and/or fledging. And from these data you can extract info about nest predators and conditions that might have lead to the loss of a chick or egg. This is a great example of the kind of project that limits the number of cameras deployed because it is too much work to manually review the data, but often the sample size (in this case number of nests monitored) is often much smaller than a statistician would recommend to make inferences across your population.

abfleishman commented 4 years ago

Another thing that would be useful to detect in images is image quality e.g. is it blurry, the lens fogged up, completely black or white (under/overexposed). This would be useful to remove images that are not useful for analysis (help quantify monitoring effort ) and also in a real-time context, alert the user that your camera needs attention (maybe a field visit)

beerys commented 4 years ago

Other things that I have heard requests for over the years:

  1. Counts for each species
  2. Classification of behavior
  3. Estimate of animal distance from the camera
  4. Estimate of animal age, or even just young/old classification
  5. Detection of disease in animals (through fur loss or skin conditions), to help track disease spreads
  6. Detection of molting
  7. Habitat categorization (ie classifying the plants that appear in a camera location, or tracking their change over time)
  8. Detection of invasive plant species

Just a few that were at the top of my head!i

ejri commented 4 years ago

Thank you for your answers and insight, it certainly helps!

I'd imagine one would need a high resolution camera for this:

  1. Detection of disease in animals (through fur loss or skin conditions), to help track disease spreads
  2. Detection of molting
  3. Habitat categorization (ie classifying the plants that appear in a camera location, or tracking their change over time)
  4. Detection of invasive plant species

We're kind of looking at 6 and 7 as a part of our project (the agriculture one) as well, in addition to detecting diseases on plants, to try and check to what level pesticides are working, and potentially reduce pesticides usages by hopefully a huge margin (along with associated emissions).

Picture1