Closed hannaboe closed 5 years ago
For your image labels are you labeling them as 0,1,2,3,4? Or do you have a different labeling system?
I labeled them as 0, 1, 2, 3, 4. Using 0 for the first species, 1 for the second and so on.
It sounds like you're doing everything right and I'm not sure what's going on. The only other potential problem could be if your csv doesn't have Unix linebreaks?
A temporary fix for you could be to trick it by saying that num_classes = 28
, but if you keep your labeling scheme it won't affect the results.
Thank you. Using num_classes = 28
helped.
I can start training a model now and it runs for quite a while. However at some point (around epoch 97 or 98) it always crashes and to me it seems like there is a file missing in 'training_output'.
I tried training a model severel times with different number of images and classes but I always end up with the same. I also tried to run classify()
with my model but that doesn't work.
That is the last part of my ouput:
2019-01-12 00:26:38.287313: epoch 98, step 1110, loss = 0.01, Top-1 = 1.00 Top-5 = 1.00 (40.8 examples/sec; 1.569 sec/batch) 2019-01-12 00:27:09.185133: epoch 98, step 1120, loss = 0.11, Top-1 = 0.97 Top-5 = 1.00 (41.9 examples/sec; 1.529 sec/batch) Traceback (most recent call last): File "train.py", line 339, in
I have the same issue, only I have more than 28 species in my dataset so need to increase the number of classes. This is my error....
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Assign requires shapes of both tensors to match. lhs shape= [512,31] rhs shape= [512,28] [[node save/Assign_4 (defined at train.py:198) = Assign[T=DT_FLOAT, _class=["loc:@output/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](output/weights/Momentum, save/RestoreV2:4)]]
Is there somewhere else we need to amend the code to allow a different number of classes?
Hi,
A bit out of topic, but do you guys know if we absolutely need to resized images in 256x256? My original size is 1920x1080 and when i resize the image becomes pixelated a lot. I feel like my model will have more difficulties to distinguish species on the resized pictures
I tried training without resizing. I get this message : ERROR:tensorflow:Exception in QueueRunner: Invalid JPEG data or crop window, data size 206074
but the program still continues
Yes, images should be resized to 256x256. The model architecture is optimized using the number of pixels from a 256x256 image. Having way more pixels means the algorithm has to "look at" a ton more stuff. The model may run, but I'd expect performance to suffer. Be interested to hear the performance from resized vs un-resized images though.
Hi @dwwolfson I trained a model using 2000 photos with something on it and 40 000 photos with nothing but moving vegetation. I did not cropped the pictures and training crashed at epoch 62 but i was still able to run the model on my photos (I used a subset from july to train and a subset from august to classify). In 95% + of the time, first guess is above 99% of accuracy. My model returns some false-positive (identifying something when the picture is empty) but very few false-negative which is great! Picture size doesnt seem to matter except maybe for the time necessary for processing. Classifying 7869 photos took 8,29 min and training crashed after 1,5 days.
getting a similar error as dlnorman6 above...changing num_classes to something besides 28 is causing me problems...looking into python code now
The problem is in the train() R function. The parameter "--retrain_from USDA182" is hard-coded which means it will continue training the model provided with the MLWIC code which has 28 classes. If you remove that part you can train your own model from scratch. There are several other parameters that are either missing or hard-coded in the R code (e.g. "--batch_size 128", num_epoches is missing). I had to tweak both the train() and classify() functions to make them more generic, but now training models with different classes work. If you run out of GPU memory try making --batch_size smaller, I am using 64. Hope that helps, happy to share my updated code.
I updated the train function so that you can specify retrain=FALSE
if you want to train from scratch. Also, you can now specify batch_size. Using a smaller batch size will take longer, but it will be more accurate. It will be better if your batch size is a multiple of 64.
great looks good and works for training so far. You are right..I'll need to change the classify to let it know where to find the new, trained model.
Hi,
I just trained a model with 2 classes (caribou vs empty photo) with the updated code specifying retrain=false and num_classes=2. Everything worked fine. However, when I try to classify using num_classes=2 I get the following error:
ValueError: input must have last dimension >= k = 5 but is 2 for 'TopKV2' (op: 'TopKV2') with input shapes: [?,2], [] and with computed input tensors: input[1] = <5>.
I'm pretty sure it has something to do with the 5 guess structure of the package. Since i don't have 5 species or more, Is there a way to tweak the package so I can just use a top 2 with the model I trained?
There is a parameter --top_n in the eval.py code that defaults to 5. The R code does not pass on that parameters to the Python function so you either have to edit the classify R function, call the python function directly from the command line, or edit the eval.py file and set the default to 2 (which might be easiest). Here the list of all the python parameters and their default values:
--load_size default= [256,256] --crop_size default= [224,224] --batch_size default= 512 --num_classes default= 1000 --top_n default= 5 --num_channels default= 3 --num_batches default=-1 --path_prefix default='./' --delimiter default=' ' --data_info default= 'val.txt' --num_threads default= 20 --architecture default= 'resnet' --depth default= 50 --log_dir default= None --save_predictions default= None
I edited the eval.py and it works fine.
Thank you for quick and easy answer @matobler .
I edited the classify
function so you can now specify top_n
as a parameter.
I tried to use
train()
to train a model with my own images but it does not work and the issue seems to be that I I have only 5 species/categories instead of 28. I resized the images to 256x256 pixels and used numbers from 0 to 4 in the image_label-csv but I get this error:When I change the number of species to 3 or something else I get the same error with
lhs shape= [3]
. Do I need to to something else than specifying the number of species innum_classes
?train
andclassify
worked fine with the example images.And I was wondering if it is necessary to resize the pictures to 256x256 pixels or if
train()
would also work with a different size?This is my input: