mikeyEcology / MLWIC

Machine Learning for Wildlife Image Classification
69 stars 16 forks source link

Photo preparation for MLWIC training: three questions #18

Closed Nova-Scotia closed 5 years ago

Nova-Scotia commented 5 years ago

I am curating a set of my own images for training a model within MLWIC.

1. Small animals and algorithm accuracy

I noticed that during initial tests of the "out-of-box" algorithm with MLWIC, classification for smaller animals (e.g., squirrels, birds, mustelids, chipmunks, rabbits etc) seemed to result in erroneous species ID (another species, or empty) more than for larger species such as bears, white-tailed deer, and humans. I assume this has to do with shrinking the photos to 256 x 256 pixels. For example, none of the mustelidae photos I fed to the algorithm were identified as mustelidae.
Is there anything I can do to mitigate this?

2. Widescreen images for training

Some of the photos present in my "training" dataset were taken at a different aspect ratio (widescreen, 9 x 16) than others (standard, 3 x 4). Shrinking the widescreen photos to 256 x 256 will result in more horizontal squishing (very technical, I know) than for a standard photo. I can crop the images to standard, but this is problematic as many animals appear on the far left or far right of the widescreen image, and I will likely end up with images tagged as having an animal in them that do not, in fact, contain an animal.
Should all images be in the same aspect ratio before training?

3. Data processing and augmentation

In Tabak et al. (2018), the authors state "Images were resized to 256 x 256 pixels following the methods and using the Python script of Norouzzadeh et al. (2018). To have a more robust model, we randomly applied different label-preserving transformations (cropping, horizontal flipping, and brightness and contrast modifications), called data augmentation...".

mikeyEcology commented 5 years ago

These are all good questions.

For 1) I don't know that we can conclude this is a body size issue. If you look at Table 1 in the MEE paper, we also found lower recall rates among some of the smaller mammals. Recall for Mustelidae was pretty low (77%); were any of your mustelids IDed by the model as another animal species in Mustelidae? There could be several factors contributing to low recall for small mammals and with the small size of the dataset (few species in the model), it would be hard to conclude that it is due to body size. This is something that I'd like to test, though. If we control for number of images per species and the number of study sites from which images were taken, what factors do affect accuracy. It's possible that body size is important.

For 2, I don't think that resizing will squish the image. Instead, I think there will be dead space above and below. This shouldn't be a problem. I would avoid cropping because you don't want to lose those animals in the periphery. Ideally, all images should be the same aspect ratio for training. But this is not something that we've tested, and you'd have to look at the literature to see if anyone else has tested this. It's an important question going forward.

For 3) a) ideally, one would take both steps before running classify, but it isn't necessary. It would be great if someone would compare performance using both approaches. We used this Python script, where src_dir is the path to where the images are stored (unmodified), and dst_dir is the path to where you want the re-sized images to go. We have had some issues with losing images, though, when running this code, which is why it isn't included in the package. b) no, but if you are training a model with or without making some modification, you can classify images using the same modifications.

Nova-Scotia commented 5 years ago

Thanks so much for taking the time to consider my questions!

1)

My mustelidae were classed as Black Bear (7%), Bobcat (1%), Canidae (4%), Elk (5%), Empty (54%, though some photos likely were empty), Mule deer (7%), Rabbits (7%), Raccoon (3%), Squirrel (1%), White-tailed deer (2%), and wild pig (9%). I haven't looked at the NACTI dataset yet; but maybe our photos had more background noise (trees, stumps, etc) and perhaps the algorithm had more trouble with that?

2)

Yeah, I'll have to think more on this one. You'd hope that the algorithm wouldn't look at them and decide, "All skinny things are species X", especially if your only training images for species X come from the location where they use widescreen format. One thing - I am not sure I understand what you mean by dead space above and below. Here's an example: The original photo: shrunken2

The photo at 256 x 256: shrunken

The photo at 256 x 256 with "dead space" used to transform widescreen to standard - is this what you mean? Is it possible the model would read the "dead space" as an attribute associated with that species? emptyspace_256

3)

a) Thanks, that's helpful. Strange about the script eating photos. Let me confirm: do you mean that rescaling and normalizing should be run before classifying, AND before training?

b) I want to be sure that I'm understanding you correctly so I don't reinvent the wheel! I was looking around to find out how the augmentation process worked, and inside the L1 folder, there is python code, data_loader.py, which seems to apply these random transformations. The data_loader.py script is then loaded into the train.py script, which is called from MLWIC::train. I'm not familiar with python so forgive me if I've misinterpreted how that all works together! - but are you saying that these bits of code (below, inside data_loader.py) aren't in fact used?

 def _train_preprocess(reshaped_image, args):
  # Image processing for training the network. Note the many random
  # distortions applied to the image.

  # Randomly crop a [height, width] section of the image.
  reshaped_image = tf.random_crop(reshaped_image, [args.crop_size[0], args.crop_size[1], args.num_channels])

  # Randomly flip the image horizontally.
  reshaped_image = tf.image.random_flip_left_right(reshaped_image)

  # Because these operations are not commutative, consider randomizing
  # the order their operation.
  reshaped_image = tf.image.random_brightness(reshaped_image,
                                               max_delta=63)
  # Randomly changing contrast of the image
  reshaped_image = tf.image.random_contrast(reshaped_image,
                                             lower=0.2, upper=1.8)

  # Subtract off the mean and divide by the variance of the pixels.
  reshaped_image = tf.image.per_image_standardization(reshaped_image)

  # Set the shapes of tensors.
  reshaped_image.set_shape([args.crop_size[0], args.crop_size[1], args.num_channels])
  #read_input.label.set_shape([1])
  return reshaped_image
mikeyEcology commented 5 years ago

1) This is interesting and it seems like a pretty serious problem in terms of out of sample accuracy. It could be many different causes that are making the model perform badly, but it would be difficult to determine why. It's unfortunate that the model is working so poorly on the same species, but this is something that the field needs to consider going forward, specifically that the model can be really bad at out of sample images. It might be worth publishing this result as a note somewhere if you have spare time (as all scientists do).

2) Yes I was referring to the latter where you have the blank space above and below. My concern would be that the model is evaluating everything in that image (including the dead space). But looking at your first example of modification, I don't think this would be a problem, because the model looks more at angles, edges, and motifs and making something skinnier shouldn't have that big of an effect.

3) No you are correct. Sorry for misleading you before - I was looking at some different code. Data augmentation does occur in the train function.

Nova-Scotia commented 5 years ago

Thanks very much for your insights!

Yes I was afraid I had misinterpreted which python code the train argument was calling for in 3), because there is also a different script inside my anaconda/python folder that is data_loader.py, but looking deeper in I think I found it didn't use the arguments present in MLWIC's train function. Funny, looking through code you're not familiar with sometimes feels like detective work! I sure do appreciate all the helpful comments inside the code!

tundraboyce commented 5 years ago

Hi @Nova-Scotia, I just wanted to double check in with your accuracy measures and see if you've found anything that improves accuracy for your images when using the MLWIC model? I'm getting around 35% accuracy for my images so far - one of the main difficulties seems to be distinguishing between horses, cattle, and elk. For example, of 15000 images of horses, 13 were ID'd as horse, almost all others as cattle; only 250 images of elk are present, though 4000 images were ID'd as elk, a fair number of them actually empty images. Similarly distinguishing between vehicles and humans seems troublesome.

I have around 60k (though will be coming in to many more soon) classified images at current and am working to get the train function running.

Did you manage to train a model using your images? And was accuracy better when/if this was successful?

Thanks in advance!

Nova-Scotia commented 5 years ago

Hey @tundraboyce , thanks for the shout out. Currently the algorithm is about 38-58% accurate for my images (38% was first test of 3000 images covering a very wide range of species; 58% was second test of 77,000 images, covering a more narrow range of species). Most images I tested in the second test were of black bears (which were often ID'd as wild pigs), empty, human, moose (often ID'd as cattle or elk), rabbits (empty or wild pig), and white tailed deer (elk & empty). However the "real" accuracy is likely slightly different because our images aren't classified 100% correctly (our images are classified by "event" or "photo series" not by image). We need to fix this issue before we can train a model by manually relabeling any empty images currently tagged as animals.

I have a user account set up with a computing cluster but I'm not looking forward to training a model as I'm anticipating a lot of troubleshooting!

It would be nice to communicate with you directly about this to support each other in this research - please feel free to email me directly.

mikeyEcology commented 5 years ago

Hi @tundraboyce and @Nova-Scotia , I'm starting to collect data on how the built in model works on different users' data. Please consider sharing your results as I've described here. Essentially, I want to publish a note showing that the model is often performing poorly on new datasets. In the paper we show that the model performed reasonably well in an out of sample dataset from Canada, and I think it's important to demonstrate that this is not always the case.

As far as working on an HPC cluster, I just want to warn you that it can be very difficult to set up tensorflow and cuda to work on these types of machines. Depending on who provides access to your HPC, they might be able to help.

tundraboyce commented 5 years ago

Hi @mikeyEcology and @Nova-Scotia. No problem on providing the results from classify. I'll send through a csv once I calculate the accuracy for each class you outline.

Great idea on communicating directly - I'll feel less reserved about asking stupid questions! I will send an email later today to touch base.