Issue with train - Githubissues

atrlbck commented 4 years ago

Hi @mikeyEcology,

I found I had low accuracy with the empty_animal model so I wanted to train an empty_animal model using my images.

I basically have a data_info file without headers with the names of the images on the first column and a 1/0 presence/absence classification on the second. (I also tried creating this file using the make_input function but I get the same error at the end).

When running the train function

train( path_prefix = "/Users/armandotoralbecker/Desktop/RM Project/MLWIC2/images)", data_info = "/Users/armandotoralbecker/Desktop/RM Project/MLWIC2/data_info_train.csv", model_dir = "/Users/armandotoralbecker/Desktop/RM Project/MLWIC2/MLWIC2_helper_files", python_loc = "/anaconda3/bin/", os = "Mac", num_gpus = 2, num_classes = 2, delimiter = ",", architecture = "resnet", depth = "18", batch_size = 128, log_dir = "empty_animal", log_dir_train = "MLWIC2_train_output", retrain = TRUE, retrain_from = "empty_animal", num_epochs = 55, top_n = 5, num_cores = 1, randomize = TRUE, max_to_keep = 5, print_cmd = FALSE, shiny = FALSE )

I get the error: Error in UseMethod("train") : no applicable method for 'train' applied to an object of class "character"

Thank you for your help

mikeyEcology commented 4 years ago

I don't know that this is the problem, but what happens if you remove the ) from your path_prefix

atrlbck commented 4 years ago

Oops, sorry. Same error after removing ")"

mikeyEcology commented 4 years ago

I've never seen that one before. A couple of things to try. Set top_n=2, and instead of train(, try MLWIC2::train(. Also, what version of R are you using? You can find out using R.Version()

atrlbck commented 4 years ago

Changin top_n=2 and using MLWIC2::train() resolved the problem. I am using version 3.6.2 of R.

Thanks @mikeyEcology

mikeyEcology commented 4 years ago

The issue here was that you probably had another package loaded that has a function called train; using MLWIC2:: tells R that you specifically want to use the train function associated with this package. I'm glad it's working for you now.

atrlbck commented 4 years ago

Hi @mikeyEcology

I'm trying to classify the images with my trained model but I am unable to. I found that the MLWIC2_train_output folder created for the trained model doesn't contain anything and it takes approximately 22 seconds to train with ~7700 images.

MLWIC2::classify(

path_prefix = '/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/images',

data_info = '/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/image_labels.csv',

model_dir = '/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/MLWIC2_helper_files',

log_dir = "MLWIC2_train_output",

save_predictions = "model_predictions.txt",

python_loc = "/anaconda3/bin/",

os = "Mac",

num_classes = 2,

num_cores = 1,

delimiter = ",",

architecture = "resnet",

depth = "18",

top_n = 2,

batch_size = 128,

num_gpus = 2,

make_output = TRUE,

output_location = NULL,

output_name = "MLWIC2_output_train.csv",

test_tensorflow = TRUE,

shiny = FALSE,

print_cmd = FALSE

) Your data_info file exists: /Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/image_labels.csv. Your `path_prefix exists: /Users/armandotoralbecker/Desktop/RM_Project. You are not using a Windows computer. Tensorflow and Python are properly installed. You are running tensorflow version 1.4.0 Now proceeding to run classify. 2020-04-04 12:12:40.166178: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA Namespace(LR_details='19, 30, 44, 53, 0.01, 0.005, 0.001, 0.0005, 0.0001', LR_policy='piecewise_linear', WD_details='30, 0.0005, 0.0', WD_policy='piecewise_linear', architecture='resnet', batch_size=128, chunked_batch_size=64, command='eval', delimiter=',', depth=18, log_debug_info=False, log_device_placement=False, log_dir='MLWIC2_train_output', max_to_keep=5, num_batches=-1, num_classes=1000, num_epochs=55, num_gpus=2, num_prefetch=2000, num_threads=1, optimizer='momentum', path_prefix='/Users/armandotoralbecker/Desktop/RM_Project', processed_size=[224, 224, 3], raw_size=[256, 256, 3], retrain_from=None, run_metadata=None, run_name='Run-04-04-2020_12-12-40', run_options=None, save_predictions='/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/MLWIC2_helper_files/model_predictions.txt', shuffle=True, snapshot_prefix='MLWIC2_train_output', top_n=2, train_info=None, transfer_mode=[0], val_info='/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/image_labels.csv') found 2 classes {0: 0, 1: 1} Filling queue with 2000 images before starting to train. This may take some time. Traceback (most recent call last): File "run.py", line 412, in main() File "run.py", line 394, in main do_evaluate(sess, args) File "run.py", line 236, in do_evaluate dnn_model.load(sess, args.log_dir) File "/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/MLWIC2_helper_files/architectures/model.py", line 279, in load self.pretrained_loader.restore(sess, ckpt.model_checkpoint_path) AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path' The classify function did not run properly.

This is the console output for MLWIC2::train()

2020-04-04 12:56:16.083822: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA Namespace(LR_details='19, 30, 44, 53, 0.01, 0.005, 0.001, 0.0005, 0.0001', LR_policy='piecewise_linear', WD_details='30, 0.0005, 0.0', WD_policy='piecewise_linear', architecture='resnet', batch_size=128, chunked_batch_size=64, command='train', delimiter=',', depth=18, log_debug_info=False, log_device_placement=False, log_dir='MLWIC2_train_output', max_to_keep=5, num_batches=-1, num_classes=2, num_epochs=55, num_gpus=2, num_prefetch=2000, num_threads=1, optimizer='momentum', path_prefix='/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/images', processed_size=[224, 224, 3], raw_size=[256, 256, 3], retrain_from='empty_animal', run_metadata=None, run_name='Run-04-04-2020_12-56-16', run_options=None, save_predictions='predictions.csv', shuffle=True, snapshot_prefix='snapshot', top_n=2, train_info='data_info_train.csv', transfer_mode=[0], val_info=None) Saving everything in MLWIC2_train_output found 2 classes {0: 0, 1: 1} Filling queue with 2000 images before starting to train. This may take some time. Traceback (most recent call last): File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [1000] [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@output/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](output/biases, save/RestoreV2_2)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "run.py", line 412, in main() File "run.py", line 383, in main do_train(sess, args) File "run.py", line 100, in do_train dnn_model.load(sess, args.retrain_from) File "/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/MLWIC2_helper_files/architectures/model.py", line 279, in load self.pretrained_loader.restore(sess, ckpt.model_checkpoint_path) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1666, in restore {self.saver_def.filename_tensor_name: save_path}) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [1000] [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@output/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](output/biases, save/RestoreV2_2)]]

Caused by op 'save/Assign_2', defined at: File "run.py", line 412, in main() File "run.py", line 383, in main do_train(sess, args) File "run.py", line 100, in do_train dnn_model.load(sess, args.retrain_from) File "/Users/armandotoralbecker/Desktop/RM_Project/MLWIC2/MLWIC2_helper_files/architectures/model.py", line 276, in load self.pretrained_loader = tf.train.Saver(tf.get_collection(SAVE_VARIABLES)) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1218, in init self.build() File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1227, in build self._build(self._filename, build_save=True, build_restore=True) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1263, in _build build_save=build_save, build_restore=build_restore) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 751, in _build_internal restore_sequentially, reshape) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 439, in _AddRestoreOps assign_ops.append(saveable.restore(tensors, shapes)) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 160, in restore self.op.get_shape().is_fully_defined()) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign validate_shape=validate_shape) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 57, in assign use_locking=use_locking, name=name) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [2] rhs shape= [1000] [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@output/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](output/biases, save/RestoreV2_2)]]

the train function ran for 22.6064670085907 secs. The trained model is in MLWIC2_train_output. Specify this directory as the log_dir when you use classify().

Thank you for your continued help

mikeyEcology commented 4 years ago

In your train call, try setting retrain = FALSE

mikeyEcology / MLWIC2

Issue with train #5