setting up MLWIC - Githubissues

mehrnoushmalek commented 5 years ago

Hello,

I'm trying to use your package, but have problems setting it up. I am a R user and have no experience in python. I've already installed anaconda, but I get the error of : setup(python_loc = "~/anaconda3/") UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your environment: Specifications: argparse -> python=2.6 Your python: python=3.7

I know this error comes from anaconda, but you may have an idea how this can be fixed. Python version in anaconda3, is 3.7, but when I type python -V I get the 2.7.14 version. I searched a bit but I wasn't able to fix this error.

mikeyEcology commented 5 years ago

You can ignore this error and proceed with classify.

mehrnoushmalek commented 5 years ago

I had to install the tensorflow on the conda, and it worked! I am trying to train it on my own data, is there a guideline on how to select data for training? I have lots of images, and when I tried the classify after it, top guess was 0 always ( as I created my image_labels for test data to be all 0) and sometimes the second one was the true answer and sometimes 3rd or fourth. I will try other options beside resnet-18, but your insights would be greatly appreciated.

mikeyEcology commented 5 years ago

I'm not sure why this happened, but my best guess would be that your training dataset size was too small. How many images were you using for training?

mehrnoushmalek commented 5 years ago

I used around 2700 images, I just changed depth, and batch_size to see if I get a better result, I also removed night/dark ones.

classify(path_prefix = paste0(test.image.path,"/"), data_info = paste0(renamed.path,"/image_labels.csv"), model_dir = paste0(working.dir,"/MLWIC"), python_loc = python.path, depth = 101, num_classes =8, log_dir = paste0(working.dir,"/",train.image.path,"/"), # where trained folder from train() is. save_predictions = "model_predictions.txt") When I run classifyon a random 100 sample from the trained file, I get a good fit, but when I test it on a new dataset, many of them are mislabeled. I may also try the densnet, and googlenet, to see if I get a better result. Did you train your algorithm on only close-up images? Cause I have images that there is a zebra in the corner or a bit farther away, and because of that some times the empty background is classified as some species. (Something like the attached image, is mis-classified as elephant.)

misclassified

If you think that the training set is small, I can make it bigger ( closer to 8000), but I couldn't find any information on the man page or the paper to know what would be a good size.

mikeyEcology commented 5 years ago

This is a pretty small dataset. Generally the model will perform poorly with fewer than 2-5,000 images per species, especially if in some of those images the animal is far from the camera. I would try using Resnet and set depth=152. This will take longer, but the added depth should improve accuracy. I also wouldn't remove the dark images unless you're not planning to use dark images when you classify them.

mehrnoushmalek commented 5 years ago

Thanks @mikeyEcology , so around 2-5000 per species? I change the depth to 152, and increase the number of files a bit and see what happens. Again thanks for guiding me through this, I really appreciate it.

mikeyEcology commented 5 years ago

Yes-per species, but maybe you'll get lucky with fewer images if you set the the depth to 152. Please keep me posted on how the model does with your dataset.

mehrnoushmalek commented 5 years ago

Definitely, I will let you know once the training is finished. Thanks again!

mehrnoushmalek commented 4 years ago

Hello @mikeyEcology . I tried to add more to my training, and augmented some for species that didn't have enough. I wasn't able to run it on 152 depth, but I tried it on 18,34, 50, and still not very good result (though Baboons being classified as human was kinda true :-) ). One this is that when I call classify it goes up to batch 25 and then I get Batch Number: 23, Top-1 Hit: 7, Top-5 Hit: 249, Top-1 Accuracy: 0.002, Top-5 Accuracy: 0.081 OMP: Info #250: KMP_AFFINITY: pid 17059 tid 17145 thread 1 bound to OS proc set 1 Batch Number: 25, Top-1 Hit: 7, Top-5 Hit: 249, Top-1 Accuracy: 0.002, Top-5 Accuracy: 0.077 OMP: Info #250: KMP_AFFINITY: pid 17059 tid 17139 thread 1 bound to OS proc set 1 OMP: Info #250: KMP_AFFINITY: pid 17059 tid 17160 thread 1 bound to OS proc set 1 [1] "evaluation of images took 44.5969877243042 secs. The results are stored in... For ~3000 images.

Shouldn't the batch size be 128?

Also is it possible to split my training and do multiple batches for 152, I didn't quite understand the retrain argument. Or is it possible to parallelize it" I don't have GPU (20 cores, 128 RAM)

Thanks a lot for guiding me through your package. I tried to test Keras, but the manual was way too complicated to understand, and test.

mikeyEcology commented 4 years ago

Hi @mehrnoushmalek I'm not sure I understand exactly what the problem is. It might be worth trying to train instead using MLWIC2, as it should be a little bit smoother, especially with regard to retraining. If you use that package, you won't have to do any of the extra setup, but you'll have to download the helper files that you can use instead of L1. In the train function (and the classify function) in MLWIC2 you can specify the num_cores.

mehrnoushmalek commented 4 years ago

Thanks @mikeyEcology for your response. I already have shiny and post analysis setup for my code :-) It seems that MLWIC is just for setting up, but the backbone algorithm is the same thing (except the thread/core number), right? I mean epoch, depth,etc. In the MLWIC there's no batch argument in the classify, so if I change 128, to something else I only get 0 as my species label. I just checked, and this is fixed in MLWIC2. Regarding the message I get for batch number, is it Ok that I see the message up to batch 25? Could you please tell me how I can use the retrain_from? That would help me a lot. I just want to know if it's possible to split my training set, so I can run it on higher depth, and then retrain until I have all images in the training?

mikeyEcology commented 4 years ago

It is okay that you get that message with batch 25. I wouldn't worry about this. I'm not sure that retrain_from is going to work in this package, but let's see. Can you provide a list of what's in your L1 directory. You can do this by setting working directory to L1, and then typing system("ls"). Also, can you please provide the code you passed to train?

mehrnoushmalek commented 4 years ago

Thank again @mikeyEcology. I have to say your responsiveness and effort for helping me is really appreciated, comparing to R package maintainers in our field. So thanks again for spending time to help. This is what I have in my L1:

architectures arch.py arch.pyc ConfMatrixID.py data_info.csv data_info_train.csv data_loader.py data_loader.pyc eval.py import_tf.py job.sh man MLWICtrain model_predictions.txt __pycache__ train.py USDA182 And this is the code for training: -MLWIC::train(path_prefix = "~//Train/images/", data_info = "~/image_labels.csv", model_dir = "~/Project-SVN/Model/", python_loc = "~/anaconda3/bin/", os = "Mac", num_classes = length(unique(species)), delimiter = ",", architecture = "resnet", depth = "18", batch_size = "128", log_dir_train = "~//MLWICtrain/", retrain = FALSE, num_epochs = 55, print_cmd = FALSE)

And the classify:

MLWIC::classify(path_prefix = test.image.path,# this is the absolute path to the images. data_info = paste0(dirname(test.image.path),"/image_labels.csv"), model_dir = model.path, python_loc = python.path, depth =18, num_classes = length(unique(species)), log_dir = train.image.path, save_predictions = "model_predictions.txt") I removed the time, and logo from the images, and ran the training again on 18, now I get 50% accuracy, comparing to 30% ( only looking at the first prediction and comparing to the actual labels). I will run the depth 50 now and see if it get better. Do you think that's possible if I pass the num_cores to train function, and then to train.py? or it needs more adjustments? That might make things a bit faster if it's parallelized, as it takes around 2.5 days to run it on depth 50. Do you think reducing the batch number may make the result better? It's not an option in the version I have, as it's not present in the classify, but should be an easy fix.

mikeyEcology commented 4 years ago

For retraining, your progress should be saved in MLWICtrain. This directory should contain some files called "Snapshot...". If not, can you run system("ls -lht") after setting this as your working directory? It's not set up currently to run num_cores on this package, but the default is 20 cores, so that's probably what you want. Likewise, the default batch size is 128, which should be good. But I'll try to get to updating this so that you can modify this as well. If resnet isn't working well, you also might want to try densenet as your architecture. I think I provide more details in using different architectures if you run ?train

mehrnoushmalek commented 4 years ago

Thanks @mikeyEcology. I actually have 16 cores, 128 RAm, but I guess it'll use all cores, but I have no idea how parallelization works in Python. Yes, I have the snapshots in my MLWICtrain. But I just want to know if the retrain works the way I'm thinking: Splitting train set into multiple sets, and run resnet with 101/152 on the first set, and then run one the second set retraining from the first one, and so on. I started the depth 50, so I'll check if it gives me better result than 18, and if so, I'll try higher depths with the retrain mode. Out of all these architecture, resnet worked better with your data, right? I will try the other ones as well to see what they would give me. When you had ~90% accuracy, it was that the first prediction was the correct one with a good confidence? Sometimes my second prediction is the correct one, so I'll grab those when the first confidence is not that high, and there's not a big difference between that and the second confidence.

mikeyEcology commented 4 years ago

Yes it will work on all of your cores. I don't think you're doing retrain right. I would train a model on all of your data (or 90% so you can test on the other 10%). If you have to stop in the middle of training, you can use retrain_from="MLWICtrain" and it will pick up where you left off. If you are retraining, you'll need to use the same architecture and depth as you did the first time you trained. We didn't test other architectures because resnet-18 worked and it takes a long time to train when you have millions of images. Densenet has been shown to outperform resnet in some tests. For accuracy, we use the top guess in calculations.

mehrnoushmalek commented 4 years ago

Thanks Mikey. Maybe I wasn't clear, with the retrain procedure. I assumed that it will be an option for me to do my training in multiple batches, not splitting the data into training and testing. The point for that was being able to go on higher depth, but it seems that's not the case.

mikeyEcology commented 4 years ago

You could split your data and re-train but I wouldn't advise this. Have you read of others using a similar approach? You are dealing with a small dataset, correct? I would advise using as many images as possible for training.

mehrnoushmalek commented 4 years ago

Yes, my dataset is not that big, I have around ~2000 for most species, but for some they weren't that common so there's not that many photo. Maybe I need to check the species you have in your dataset, and use them as well. you mean similar approach besides MLWIC? I looked into Keras, but it wasn't as easy as this package to setup. There are multiple camera traps, but I decided to train each camera separately, as the angles, and background is very different. I will see if it makes sense to merge them all together.

mikeyEcology commented 4 years ago

I meant a similar approach as far as training different models for subsets of your images. This is a pretty new technology and I wasn't sure if someone else had tried it. You could, but I think you'll get much higher accuracy if you include all images in one model. It might be best to start this way anyway. But there are a lot of questions about best practices that are unanswered.

mikeyEcology commented 4 years ago

Also, in the latest model, I have gotten high recall rates (> 95%) for some species with fewer than 1,000 species in resnet-18, so if you are using a deeper resnet, you can probably get good accuracy with your dataset.

mehrnoushmalek commented 4 years ago

So I tried resnet_101, and only 50 more species were identified properly, comparing to resnet_18 ( for a test of 3237 images). I also tried to run densenet, but I got the error of NameError: name 'common' is not defined, in getModel. Do you happen to know how to fix this error?

mikeyEcology / MLWIC

setting up MLWIC #32