mikeyEcology / MLWIC

Machine Learning for Wildlife Image Classification
70 stars 16 forks source link

Error while training #16

Open kow10120 opened 5 years ago

kow10120 commented 5 years ago

Hello all,

I am trying to train a model on a Windows computer. When I input the following:

 train(path_prefix = "F:/IERCMLWIC/TrainingImages/TRAININGIMAGES",

         data_info = "F:/IERCMLWIC/L1/data_info.csv",

         model_dir = "F:/IERCMLWIC",

         python_loc = "C:/Users/kvanatta/Anaconda3/",

         num_classes = 24,

         log_dir_train = "IERCMLWIC" 
         )

I get the following output:

train(path_prefix = "F:/IERCMLWIC/TrainingImages/TRAININGIMAGES",

          data_info = "F:/IERCMLWIC/L1/data_info.csv",

          model_dir = "F:/IERCMLWIC",

          python_loc = "C:/Users/kvanatta/Anaconda3/",

          num_classes = 24,

          log_dir_train = "IERCMLWIC" 
          )
Error in UseMethod("train") : 
  no applicable method for 'train' applied to an object of class "character"

Does anyone have experience with this?

Nova-Scotia commented 5 years ago

did you try added a "/" to the end of your path_prefix?

Nova-Scotia commented 5 years ago

and maybe your model_dir, not sure. I know classify has some code that deals with missing slashes, not sure if train does without doing some digging

mikeyEcology commented 5 years ago

This does not look like an error message from MLWIC. You might have another package loaded that has a function called train. A way to be sure you're using the correct function is to be more specific when you use it, so try using MLWIC::train() instead.

kow10120 commented 5 years ago

Thank you both for the help. I've tried both of your suggestions. Mikey following your suggestion to use MLWIC::train() I am now getting different output:

 MLWIC::train(path_prefix = "F:/IERCMLWIC/TRAININGIMAGES",  
          data_info = "F:/IERCMLWIC/L1/data_info.csv", 
          model_dir = "F:/IERCMLWIC", 
          python_loc = "C:/Users/kvanatta/Anaconda3/",  
          num_classes = 24, 
          log_dir_train = "IERCMLWIC" 
          )
C:\Users\kvanatta\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "train.py", line 339, in <module>
    main()
  File "train.py", line 319, in main
    args.num_samples = sum(1 for line in open(args.data_info))
FileNotFoundError: [Errno 2] No such file or directory: 'data_info_train.csv'
[1] "training of model took 2.23065400123596 secs. The trained model is in IERCMLWIC. Specify this directory as the log_dir when you use classify(). ""

It appears that tensorflow was masking the train() function. I have tried every combination of trailing slashes on path_prefix and model_dir as suggested by Nova-Scotia, but the output shown above does not change.

mikeyEcology commented 5 years ago

It seems like for some reason, there are issues when you try to use the package when your files are on an external drive. (I'm assuming that your F drive is external?)

One potential solution is to move and rename your data_info.csv file manually. So rename this file data_info_train.csv and make sure that it is in the folder F:/IERCMLWIC/L1/.

You also might want to try setting retrain = FALSE, but this probably wouldn't fix the error that you're getting.

Nova-Scotia commented 5 years ago

@kow10120 , did you get this to work? The fix @mikeyEcology suggested (rename the .csv to data_info_train.csv) worked for me, after I got the same errors that you noted.

kow10120 commented 5 years ago

Thank you for the advice, sorry I have not had much free time to devote to this recently. I did not get it to work, but I may need to spend more time carefully combing through the large excel file I'm working with and the actual photos to ensure there are no discrepancies. The joys of large data sets. Thank you for the help, and I will update when I am ready to proceed again.

mikeyEcology commented 5 years ago

Hopefully you're not going through the file names manually? There are ways to do this in Unix that can save you a lot of time. In Unix, I would go to the directory where I have the files and type find $PWD -type f > listOfFiles.txt, which would create a file in my directory called listOfFiles.txt with the whole list. Presumably Windows has a similar function.

tundraboyce commented 5 years ago

Had this same error and just wanted to say that changing the name to data_info_train.csv worked. I ran into another error I'm hoping might be obvious in train. I'm guessing this is to do with the num_classes argument? I have 23 classes that I'm trying to train with, have a missed a step somewhere to specify this?

I have only done so in the line "python_loc = "C:\Users\User\Anaconda3\",num_classes = 23, log_dir_train = "traindir" so far

Assign requires shapes of both tensors to match. lhs shape= [23] rhs shape= [28] [[node save/Assign_1 (defined at train.py:198) ]]

mikeyEcology commented 5 years ago

@tundraboyce did you specify retrain=FALSE? Can you please post all of the code that you put in the train function and all of the output?

mikeyEcology commented 5 years ago

This was previously not explained in the readme. I just updated it so that this is more clear:

G) If your num_classes is not equal to the number in the built in model (num_classes != 28), you will need to specify retrain=FALSE.

tundraboyce commented 5 years ago

That sorted it, thanks again! Appreciate the responses.

Now I just have " InvalidArgumentError (see above for traceback): targets[14] is out of range [[node Tower_1/in_top_k/InTopKV2 (defined at train.py:127) ]]"

mikeyEcology commented 5 years ago

Hey @Nova-Scotia since you're the Windows expert on MLWIC, I'm wondering if you can try something for me when you get a chance. I updated train and classify so that they should properly move the data_info file on a Windows computer if you set os=Windows in the function call. I don't have a way to test it, though, because I'm only running Linux.

Nova-Scotia commented 5 years ago

Hi @mikeyEcology , sure, I can do that - might take me a couple days to get to it (busy week!) but I'll keep you posted. Let me know if another user gets to it first!

Erica

mikeyEcology commented 5 years ago

Thank you @Nova-Scotia !

Nova-Scotia commented 5 years ago

Hi again. Did a quick check of classify, haven't tried train yet. It wasn't working so I dug into the code and realized maybe there's an easy fix?

in classify the code tells R to make a new file named "data_info_train.csv":

if (os == "Windows") {
        data_file <- read.table(data_info, header = FALSE, sep = ",")
        output.file <- file("data_info_train.csv", "wb")
        write.table(data_file, file = output.file, append = TRUE, 
            quote = FALSE, row.names = FALSE, col.names = FALSE, 
            sep = ",")
        close(output.file)
        rm(output.file)

but then later in the code it calls for "data_info.csv":

 eval_py <- paste0(python_loc, "python eval.py --architecture ", 
        architecture, " --depth ", depth, " --log_dir ", log_dir, 
        " --path_prefix ", path_prefix, " --batch_size 128 --data_info data_info.csv", 
        " --delimiter ", delimiter, " --save_predictions ", save_predictions, 
        " --top_n ", top_n, " --num_classes=", num_classes, "\n")

Maybe just a typo when copy-pasting from train code?

tundraboyce commented 5 years ago

I actually managed to get train to work this morning and I have another computer chugging away on it now. The issue you described was pretty spot on.

Train was looking for a data_info_train.csv regardless of what was called in the code. I was trying to call a different file (e.g., data_info_pilot.csv) but the code would only work, and only look for "data_info_train" in the L1 folder. . Also my out-of-range error came from num_classes = 23: 23+0 = 24, duh. Brain freeze.

I'll let you know how well classify works with this model on my images.

Nova-Scotia commented 5 years ago

Just an update - the classify command does work as expected if you change "data_info_train.csv" to "data_info.csv" in the source code.

mikeyEcology commented 5 years ago

Thank you @Nova-Scotia and @tundraboyce for testing this. I corrected the error that you suggested with classify. That's what I get for trying to copy and paste.

pirocha commented 5 years ago

Hi! I'm trying to train a model with my species, but I'm getting a different error you mentioned in this topic.

The input I'm using is:

MLWIC::train(
    path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa/", 
    data_info = "C:/Users/User/Documents/CameraTrap/L1/data_info_train.csv",
    model_dir = "C:/Users/User/Documents/CameraTrap", 
    python_loc = "C:/Users/User/Anaconda3/", 
    os = "Windows", 
    num_classes = 51, 
    delimiter = ",", 
    architecture = "resnet", 
    depth = "152", 
    batch_size = "64",
    log_dir_train = "angola_output", 
    retrain = FALSE, 
    print_cmd = FALSE )

and I get the following output: C:\Users\User\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype fromfloattonp.floatingis deprecated. In future, it will be treated asnp.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Namespace(LR_steps=[19, 30, 44, 53], LR_values=[0.01, 0.005, 0.001, 0.0005, 0.0001], WD_steps=[30], WD_values=[0.0005, 0.0], architecture='resnet', batch_size=64, chunked_batch_size=32, crop_size=[224, 224], data_info='data_info_train.csv', delimiter=',', depth=152, load_size=[256, 256], log_debug_info=False, log_device_placement=False, log_dir='angola_output', num_batches=3095, num_channels=3, num_classes=51, num_epochs=55, num_gpus=2, num_samples=198073, num_threads=20, path_prefix='C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa/', retrain_from=None, run_name='Run-31-08-2019_19-19-32', shuffle=True, snapshot_prefix='snapshot', top_n=2, transfer_mode=[0]) Saving everything in angola_output Traceback (most recent call last): File "train.py", line 339, in <module> main() File "train.py", line 335, in main train(args) File "train.py", line 99, in train images, labels = data_loader.read_inputs(True, args) File "C:\Users\User\Documents\CameraTrap\L1\data_loader.py", line 23, in read_inputs filepaths, labels = _read_label_file(args.data_info, args.delimiter) File "C:\Users\User\Documents\CameraTrap\L1\data_loader.py", line 19, in _read_label_file labels.append(int(tokens[1])) ValueError: invalid literal for int() with base 10: 'NA\n' [1] "training of model took 3.71519708633423 secs. The trained model is in angola_output. Specify this directory as the log_dir when you use classify(). "

Can somebody help me with this? Thank you!

mikeyEcology commented 5 years ago

Hi @pirocha, I'm not sure exactly what the problem is, but to rule some things out, can you try running this

MLWIC::train(
    path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa", 
    data_info = "C:/Users/User/Documents/CameraTrap/L1/data_info_train.csv",
    model_dir = "C:/Users/User/Documents/CameraTrap", 
    python_loc = "C:/Users/User/Anaconda3/", 
    os = "Windows", 
    num_classes = 51, 
    delimiter = ",", 
    architecture = "resnet", 
depth = "18", 
    #depth = "152", 
    #batch_size = "64",
batch_size = "128",
    log_dir_train = "angola_output", 
    retrain = FALSE, 
    print_cmd = FALSE )
pirocha commented 5 years ago

Hi @mikeyEcology , I tried to change the depth and batch_size as you suggested but I still get the same output. I know nothing about programming, but is it possible that the python script considers something like an integer that is a float in my data? For instance, in data_info_train.csv?

mikeyEcology commented 5 years ago

Did you try also re-setting your path prefix to: path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa

pirocha commented 5 years ago

Hi Mikeyecology, You mean without the slash in the end, don't you? Yes, I tried both ways.

I tested the train command with the example you provided and it worked perfectly, so it had to be something with my files. I re-checked my 'data_info_train.csv' and found some NA's in the species column. I'm sorry for bothering you and it was only a mistake I did! Anyway, apparently it's running now.

Thank you very much, Filipe

mikey_t notifications@github.com escreveu no dia domingo, 1/09/2019 à(s) 12:52:

Did you try also re-setting your path prefix to: path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mikeyEcology/MLWIC/issues/16?email_source=notifications&email_token=AMAR54FGZB3AJG6O2NBFWNDQHOUJFA5CNFSM4G2E36UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UAWSI#issuecomment-526912329, or mute the thread https://github.com/notifications/unsubscribe-auth/AMAR54DVC44OVFX3XRR2NUTQHOUJFANCNFSM4G2E36UA .

mikeyEcology commented 5 years ago

Ok. No worries. Yes-NAs in the input file will cause errors. Glad you got it running.