Species tags - Githubissues

mikeyEcology / MLWIC

Machine Learning for Wildlife Image Classification

70 stars 16 forks source link

Species tags #23

Closed pmcgove closed 5 years ago

pmcgove commented 5 years ago

Hi,

When classifying images using the default model, you suggest assigning an ID of 0 to any unknown species images. However, based on the species ID .csv, 0 = Moose. How would you suggest differentiating between moose and unknown species if they occur in the same dataset?

Along similar lines, is it possible to exclude species categories from the existing model when classifying images? I've run the model on a set of 5000 images from the Midwestern U.S. and had a top-1 accuracy of 0.171. I'm curious if I can censor species like mule deer to bump up the default model accuracy.

Thanks!

mikeyEcology commented 5 years ago

For the first question, you could use another ID, I just chose 0 because it is a number that has to be in the dataset if you are using a model that you trained on fewer species. You could use any number between 0 and 26 if you're using the built in model. Or you could use 0 and have another data table somewhere showing what the true ID of each row is. Unfortunately, there is no way to restrict the categories. This is a limitation. One potential approach is if you're working in an area with no mule deer you could assume that all deer the model sees are white tail (but you would want to be sure to check this first) in post-hoc data sorting.

pmcgove commented 5 years ago

Thanks for the quick response, Mikey. Impressively, the default model only tagged 70 out of 4700 white-tailed deer images as mule deer. It had a much higher tendency to classify them as wild pig (1500) or empty (1000).

I tried to retrain the default model on our classified images, but kept running into errors. If I set retrain=FALSE, it ran fine however. Does changing the number and/or ID of species classes while retraining cause problems?

mikeyEcology commented 5 years ago

It is not working on well on your images and re-training looks like the best option for you. If you are changing the number of classes, you will need to specify retrain=FALSE, otherwise, you will have problems. Just changing the class ID's and maintaining the same number of classes will not cause an error if you specify retrain=TRUE, but it is not a good idea. If you only have 5,000 classified images, though, your new model will also likely not have very good accuracy.

pmcgove commented 5 years ago

OK that makes sense. We are just beginning image classification and plan to have a much larger dataset to train on by the end of summer. I just wanted to test out MLWIC and your model early on to inform our workflow.