Open kow10120 opened 5 years ago
did you try added a "/" to the end of your path_prefix?
and maybe your model_dir, not sure. I know classify
has some code that deals with missing slashes, not sure if train does without doing some digging
This does not look like an error message from MLWIC
. You might have another package loaded that has a function called train
. A way to be sure you're using the correct function is to be more specific when you use it, so try using MLWIC::train()
instead.
Thank you both for the help. I've tried both of your suggestions. Mikey following your suggestion to use MLWIC::train()
I am now getting different output:
MLWIC::train(path_prefix = "F:/IERCMLWIC/TRAININGIMAGES",
data_info = "F:/IERCMLWIC/L1/data_info.csv",
model_dir = "F:/IERCMLWIC",
python_loc = "C:/Users/kvanatta/Anaconda3/",
num_classes = 24,
log_dir_train = "IERCMLWIC"
)
C:\Users\kvanatta\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "train.py", line 339, in <module>
main()
File "train.py", line 319, in main
args.num_samples = sum(1 for line in open(args.data_info))
FileNotFoundError: [Errno 2] No such file or directory: 'data_info_train.csv'
[1] "training of model took 2.23065400123596 secs. The trained model is in IERCMLWIC. Specify this directory as the log_dir when you use classify(). ""
It appears that tensorflow was masking the train()
function. I have tried every combination of trailing slashes on path_prefix
and model_dir
as suggested by Nova-Scotia, but the output shown above does not change.
It seems like for some reason, there are issues when you try to use the package when your files are on an external drive. (I'm assuming that your F drive is external?)
One potential solution is to move and rename your data_info.csv file manually. So rename this file data_info_train.csv
and make sure that it is in the folder F:/IERCMLWIC/L1/
.
You also might want to try setting retrain = FALSE
, but this probably wouldn't fix the error that you're getting.
@kow10120 , did you get this to work? The fix @mikeyEcology suggested (rename the .csv to data_info_train.csv
) worked for me, after I got the same errors that you noted.
Thank you for the advice, sorry I have not had much free time to devote to this recently. I did not get it to work, but I may need to spend more time carefully combing through the large excel file I'm working with and the actual photos to ensure there are no discrepancies. The joys of large data sets. Thank you for the help, and I will update when I am ready to proceed again.
Hopefully you're not going through the file names manually? There are ways to do this in Unix that can save you a lot of time. In Unix, I would go to the directory where I have the files and type find $PWD -type f > listOfFiles.txt
, which would create a file in my directory called listOfFiles.txt with the whole list. Presumably Windows has a similar function.
Had this same error and just wanted to say that changing the name to data_info_train.csv worked. I ran into another error I'm hoping might be obvious in train. I'm guessing this is to do with the num_classes argument? I have 23 classes that I'm trying to train with, have a missed a step somewhere to specify this?
I have only done so in the line "python_loc = "C:\Users\User\Anaconda3\",num_classes = 23, log_dir_train = "traindir" so far
Assign requires shapes of both tensors to match. lhs shape= [23] rhs shape= [28] [[node save/Assign_1 (defined at train.py:198) ]]
@tundraboyce did you specify retrain=FALSE
? Can you please post all of the code that you put in the train
function and all of the output?
This was previously not explained in the readme. I just updated it so that this is more clear:
G) If your
num_classes
is not equal to the number in the built in model (num_classes != 28
), you will need to specifyretrain=FALSE
.
That sorted it, thanks again! Appreciate the responses.
Now I just have " InvalidArgumentError (see above for traceback): targets[14] is out of range [[node Tower_1/in_top_k/InTopKV2 (defined at train.py:127) ]]"
Hey @Nova-Scotia since you're the Windows expert on MLWIC, I'm wondering if you can try something for me when you get a chance. I updated train
and classify
so that they should properly move the data_info
file on a Windows computer if you set os=Windows
in the function call. I don't have a way to test it, though, because I'm only running Linux.
Hi @mikeyEcology , sure, I can do that - might take me a couple days to get to it (busy week!) but I'll keep you posted. Let me know if another user gets to it first!
Erica
Thank you @Nova-Scotia !
Hi again. Did a quick check of classify
, haven't tried train
yet. It wasn't working so I dug into the code and realized maybe there's an easy fix?
in classify
the code tells R to make a new file named "data_info_train.csv"
:
if (os == "Windows") {
data_file <- read.table(data_info, header = FALSE, sep = ",")
output.file <- file("data_info_train.csv", "wb")
write.table(data_file, file = output.file, append = TRUE,
quote = FALSE, row.names = FALSE, col.names = FALSE,
sep = ",")
close(output.file)
rm(output.file)
but then later in the code it calls for "data_info.csv"
:
eval_py <- paste0(python_loc, "python eval.py --architecture ",
architecture, " --depth ", depth, " --log_dir ", log_dir,
" --path_prefix ", path_prefix, " --batch_size 128 --data_info data_info.csv",
" --delimiter ", delimiter, " --save_predictions ", save_predictions,
" --top_n ", top_n, " --num_classes=", num_classes, "\n")
Maybe just a typo when copy-pasting from train
code?
I actually managed to get train to work this morning and I have another computer chugging away on it now. The issue you described was pretty spot on.
Train was looking for a data_info_train.csv regardless of what was called in the code. I was trying to call a different file (e.g., data_info_pilot.csv) but the code would only work, and only look for "data_info_train" in the L1 folder. . Also my out-of-range error came from num_classes = 23: 23+0 = 24, duh. Brain freeze.
I'll let you know how well classify works with this model on my images.
Just an update - the classify
command does work as expected if you change "data_info_train.csv" to "data_info.csv" in the source code.
Thank you @Nova-Scotia and @tundraboyce for testing this. I corrected the error that you suggested with classify. That's what I get for trying to copy and paste.
Hi! I'm trying to train a model with my species, but I'm getting a different error you mentioned in this topic.
The input I'm using is:
MLWIC::train(
path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa/",
data_info = "C:/Users/User/Documents/CameraTrap/L1/data_info_train.csv",
model_dir = "C:/Users/User/Documents/CameraTrap",
python_loc = "C:/Users/User/Anaconda3/",
os = "Windows",
num_classes = 51,
delimiter = ",",
architecture = "resnet",
depth = "152",
batch_size = "64",
log_dir_train = "angola_output",
retrain = FALSE,
print_cmd = FALSE )
and I get the following output:
C:\Users\User\ANACON~1\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from
floatto
np.floatingis deprecated. In future, it will be treated as
np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters Namespace(LR_steps=[19, 30, 44, 53], LR_values=[0.01, 0.005, 0.001, 0.0005, 0.0001], WD_steps=[30], WD_values=[0.0005, 0.0], architecture='resnet', batch_size=64, chunked_batch_size=32, crop_size=[224, 224], data_info='data_info_train.csv', delimiter=',', depth=152, load_size=[256, 256], log_debug_info=False, log_device_placement=False, log_dir='angola_output', num_batches=3095, num_channels=3, num_classes=51, num_epochs=55, num_gpus=2, num_samples=198073, num_threads=20, path_prefix='C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa/', retrain_from=None, run_name='Run-31-08-2019_19-19-32', shuffle=True, snapshot_prefix='snapshot', top_n=2, transfer_mode=[0]) Saving everything in angola_output Traceback (most recent call last): File "train.py", line 339, in <module> main() File "train.py", line 335, in main train(args) File "train.py", line 99, in train images, labels = data_loader.read_inputs(True, args) File "C:\Users\User\Documents\CameraTrap\L1\data_loader.py", line 23, in read_inputs filepaths, labels = _read_label_file(args.data_info, args.delimiter) File "C:\Users\User\Documents\CameraTrap\L1\data_loader.py", line 19, in _read_label_file labels.append(int(tokens[1])) ValueError: invalid literal for int() with base 10: 'NA\n' [1] "training of model took 3.71519708633423 secs. The trained model is in angola_output. Specify this directory as the log_dir when you use classify(). "
Can somebody help me with this? Thank you!
Hi @pirocha, I'm not sure exactly what the problem is, but to rule some things out, can you try running this
MLWIC::train(
path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa",
data_info = "C:/Users/User/Documents/CameraTrap/L1/data_info_train.csv",
model_dir = "C:/Users/User/Documents/CameraTrap",
python_loc = "C:/Users/User/Anaconda3/",
os = "Windows",
num_classes = 51,
delimiter = ",",
architecture = "resnet",
depth = "18",
#depth = "152",
#batch_size = "64",
batch_size = "128",
log_dir_train = "angola_output",
retrain = FALSE,
print_cmd = FALSE )
Hi @mikeyEcology ,
I tried to change the depth
and batch_size
as you suggested but I still get the same output.
I know nothing about programming, but is it possible that the python script considers something like an integer that is a float in my data? For instance, in data_info_train.csv
?
Did you try also re-setting your path prefix to:
path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa
Hi Mikeyecology, You mean without the slash in the end, don't you? Yes, I tried both ways.
I tested the train command with the example you provided and it worked perfectly, so it had to be something with my files. I re-checked my 'data_info_train.csv' and found some NA's in the species column. I'm sorry for bothering you and it was only a mistake I did! Anyway, apparently it's running now.
Thank you very much, Filipe
mikey_t notifications@github.com escreveu no dia domingo, 1/09/2019 à(s) 12:52:
Did you try also re-setting your path prefix to: path_prefix = "C:/Users/User/Documents/CameraTrap/MLWIC_examples-master/images_africa
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mikeyEcology/MLWIC/issues/16?email_source=notifications&email_token=AMAR54FGZB3AJG6O2NBFWNDQHOUJFA5CNFSM4G2E36UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5UAWSI#issuecomment-526912329, or mute the thread https://github.com/notifications/unsubscribe-auth/AMAR54DVC44OVFX3XRR2NUTQHOUJFANCNFSM4G2E36UA .
Ok. No worries. Yes-NAs in the input file will cause errors. Glad you got it running.
Hello all,
I am trying to train a model on a Windows computer. When I input the following:
I get the following output:
Does anyone have experience with this?