mikeyEcology / MLWIC

Machine Learning for Wildlife Image Classification
70 stars 16 forks source link

Added instructions on resizing images. #17

Closed VwakeM closed 5 years ago

VwakeM commented 5 years ago

I used the EBImage package to resize pictures to 256 X 256 px as required by MLWIC. Thought this might be useful for someone looking for instructions to resize pictures. Also, I've added a few lines of code to create the data_info.csv with 0 filled species id column.

Nova-Scotia commented 5 years ago

Good timing @VwakeM, I just submitted an issue wondering about whether MLWIC resizes and normalizes photos (as mentioned in the paper M. Tabak et al. (2018) reference, Norouzzadeh et al. 2018).

VwakeM commented 5 years ago

I have similar questions @Nova-Scotia ! Part of submitting this pull request is to validate whether I'm doing the right thing :).

mikeyEcology commented 5 years ago

I have not used this package before, so I'm not sure how it works. Re-sizing is a relatively basic operation, so there are a number of options that will work. I would just check a random subset of your images before and after resizing to ensure that they look the same (just with fewer pixels). We used this Python script to resize, but it is certainly not the only option.

VwakeM commented 5 years ago

Thanks @mikeyEcology! I agree that this code does not need to be part of the package. There are several ways to do it. I'll close the pull request.

tundraboyce commented 5 years ago

This has obviously worked for you guys, I just get a "cannot create jpeg" error, I have tried moving images to different drives but apart from that there isn't a lot of info on the error for the package. Did anyone come across this during resizing?

Nova-Scotia commented 5 years ago

Hey @tundraboyce, I used a different program to resize images, called magick. See more at the magick website. This is the code I used:

# Function tries to copy and resize images to a new location; 
# however, script does not fail if file is corrupted and records
# whether it worked to a new vector.
process_image <- function(image.path, new.image.path){
  tryCatch(
    {
    image_write(image_scale(image_read(image.path),"256x256!"), path = new.image.path)
    return(substr(image.path, nchar(image.path) - 2, nchar(image.path)))
      },
    error = function(e)  
    {
      return(NA)
      }
  )
}

Here is an example of how I would use the code:

# library(magick)
# library(tidyverse)

photos <- mydf$original_photo_path
newphotos <- mydf$new_smaller_photo_path

# This line resizes the photos and also records whether it was successful or not. 
# If successful it records "JPG", if not, `NA`.
result <- map2_chr(photos, newphotos, process_image) 

# record if photo was corrupted in my dataframe so I can check them out later
mydf$Imageclass <- result
Nova-Scotia commented 5 years ago

If you want to get REALLY fancy, you can use the package furrr and use future_map2_chr to do it in parallel, and make it really fast :)

tundraboyce commented 5 years ago

Thanks for the speedy reply! I'll give this a go and see if I can get it to work, maybe even fancy

Dumb question but did you import your images a data.frame? Rather than a path to the drive folder?

On Tue, May 28, 2019 at 12:29 PM Nova-Scotia notifications@github.com wrote:

If you want to get REALLY fancy, you can use the package furrr and use future_map2_chr to do it in parallel, and make it really fast :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mikeyEcology/MLWIC/pull/17?email_source=notifications&email_token=ALBOYXPKTE2GA5YV7T3M2B3PXV2YLA5CNFSM4G2UJOXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWNAZVI#issuecomment-496635093, or mute the thread https://github.com/notifications/unsubscribe-auth/ALBOYXM72VJ5HB7D6RUMK6TPXV2YLANCNFSM4G2UJOXA .

--

Cheers, Paul

Nova-Scotia commented 5 years ago

Hmm, not sure what you are asking. I had a data.frame that listed, for each image, where it was -- e.g., original_photo_path, = photos2017/Craggie_Lk/2017-05-08 12-36-08 M 1_5.JPG, and then another field, new_smaller_photo_path, which was where I wanted it to GO - e.g., test_MLWIC/Craggie_2017-05-08 12-36-08 M 1_5.JPG.

tundraboyce commented 5 years ago

Okay cool sorry yeah I was meaning the path names as DF. My "result" is all NAs, no resizing, not sure what I'm doing wrong but maybe i'm destined not to resize anything.

tundraboyce commented 5 years ago

example_df

do my path names in my df need quotes? I've tried it both ways but maybe it's something simple

Nova-Scotia commented 5 years ago

@tundraboyce, try the function on just one picture and see if you can figure out the problem using better error messages from magick. You shouldn't need quotes in the file names.

e.g.,

image <- image_read(mydf$image.path[1])
image <- image_scale(image, "256x256!")
image <- image_write(image, path = (mydf$new.image.path[1])
tundraboyce commented 5 years ago

It works perfectly when I put the full filepath into image.read (i.e., don't try to call it from a DF).

But I get the "Error in image_read(mydf$original_photo_path[1]) : path must be URL, filename or raw vector" when trying to call from the DF of names.

The file path calls correctly when I just print from the DF. Would filepaths need to be factors, characters maybe?

Nova-Scotia commented 5 years ago

Filepaths should definitely be characters!

MelissaSt commented 5 years ago

Hi! Did you have problems with magick filling up your computer memory and not finishing the function? I guess there is a leak in the package? If so how did you handle this? Thank you!

Nova-Scotia commented 5 years ago

Hi Melissa, I did not have that issue... not sure what you mean by a leak, but I've only ever had great performance from magick!

MelissaSt commented 5 years ago

Thank you for your quick response. By leak - I mean magick seems to fill up my temporary files before it is finished running and then stalls Rstudio. I have to run in the code small chunks and use gc() and I can't seem to get that working in a loop.

Nova-Scotia commented 5 years ago

How many are you trying to process?

MelissaSt commented 5 years ago

Right now ~250,000 for a test run but I have ~850,000 more after that.

Nova-Scotia commented 5 years ago

I've done it with 81,000 and it's been fine... not sure what your computer specs are.

MelissaSt commented 5 years ago

Well I believe that is part of my problem. I am running these off of 2 external hard drives because my computer does not have the space for all of them, so when magick dumps into the temp files, it fills up my computer and stalls. I am trying to find a way to clean the dumped files as it processes them but I have not found that yet. I have a windows computer, I am not good in python and am trying to work this all through R. But baby steps are working for now. I was just trying to automate everything and make it go faster.

Nova-Scotia commented 5 years ago

I don't remember magick even making temp files as it processed! Sorry I can't be more helpful.

MelissaSt commented 5 years ago

That's okay, that's for answering me and trying, I appreciate it!

MelissaSt commented 5 years ago

I figured out an answer to my own question and thought I would share it here (is that okay?) just in case someone else has the same problem I did.

Using package magick to resize +100k photos that are in subfolders for each camera trap I was running out of space in my temp folder even though the read_image and write_image were pulling from external harddrives because of something to do with magick that is beyond me. I made a function that loops through each folder and then cleans the garbage (gc()) in the temp folder so that I have enough space to cycle through them all with one line of code:

imageResizeR4 <- function(image.path, new.image.path){ require(magick) for(i in seq_along(image.path)) image_write(image_scale(image_read(image.path[i]),"256x256!"), path = new.image.path[i]) gc() }

I made a data frame as suggested above by Nova-Scotia with the current photo path and the new photo path and made them characters. To do all the folders: imageResizeR4(imdf$original_photo_path, imdf$new_photo_path)

To pick rows out of folders if they still have too many photos in them: imageResizeR4(imdf$original_photo_path[22702:262357], imdf$new_photo_path[22702:262357])

tywerdel commented 5 years ago

Trying to figure out if my computer is just super slow, or if I'm doing something incorrectly; I am using the EBImage package and it takes about a second per picture to resize each photo running a loop. I have 3 million photos to go through, so if this is how long it will take just to resize photos, maybe it's not even worth it, as I can tag photos myself at 1 second per photo.

tundraboyce commented 5 years ago

I couldn’t get EBI to work but with python script I could get 2 images per second resized. That 72 hours resizing for me personally was still useful freeing me up to do anything other than tagging photos.

mikeyEcology commented 5 years ago

@tundraboyce didn't you also try running the functions without resizing first? Or am I mistaken?

tundraboyce commented 5 years ago

@mikeyEcology no I don’t think so, most of my issues were related to drives, info file names etc. Does the function work without resizing?

mikeyEcology commented 5 years ago

It is possible that it will work without resizing, but it might throw an error. @tywerdel do you want to try running without first resizing your images to see if it works and how long it takes? You could try it with a small subset of your images. Are you trying to classify or train?

tywerdel commented 5 years ago

I resized and classified 1 of my sites (7,500 photos). The classify only took ~30 seconds, but the resizing took >2 hours. As you suggested, I then ran classify without resizing, took ~30 seconds on original size photos and returned similar results (so this is probably the better option). Unfortunately the results were not accurate, possibly because this particular camera was set on the edge of an agriculture field and the program classified nearly all the photos (corn in the background) as human and missed photos of coyotes and pheasants. I have 1.5 million photos from last year tagged at similar sites, so I feel like the next logical step, and probably the only way for this to work on my data is to train.

mikeyEcology commented 5 years ago

Thank you for testing this @tywerdel. If you have a lot of classified images, it will work much better to train your own model. Low accuracy is not surprising given what others have found, although it will depend on the species and the background you are using. Please consider sharing your results for running the built in model as is discussed here.