Different performance when model is reloaded

soumith / imagenet-multiGPU.torch

an imagenet example in torch.

BSD 2-Clause "Simplified" License

401 stars 158 forks source link

Different performance when model is reloaded #12

Closed dileepc closed 8 years ago

dileepc commented 9 years ago

Hi I am training alexnet on my own image data. approximately 10,000 images and 3 classes. I ran the training procedure and saved the models at the end of every epoch. train and test log files show accuracies of > 90% . But when I load the model and test it on the same training and testing data I get very poor results.
img = trainHook(basedir .. file) preds = model:forward( img:cuda() ) , pred_sorted = preds:sort( true ) predictions[ file ] = pred_sorted[1] TrainHook takes care of cropping and mean,std normalization. Do you have any idea why this might happen? I can provide more information if it is not clear

soumith commented 9 years ago

In your test script, before testing, make sure you call model:evaluate() which puts all the batchnorm modules into testing mode and switches off dropout.

dileepc commented 9 years ago

Forgot to mention that, I call model:evaluate() right after loading the model

soumith commented 9 years ago

the test labels, do you make sure they are in the same order as when trained? for example, while training they could be: {cat, dog, horse}, but in your test scripts they might have been repermuted as {dog, cat, horse}

dileepc commented 9 years ago

Yes, They are in the same order. I also tried getting predictions only on images from a particular class (base_dir points to a particular class), even this gives poor results

soumith commented 9 years ago

@dileepc I really am not sure. reproducing the results with a loaded model hasn't been a problem for us. It might be something simple like image normalization being different in your test script.

dileepc commented 9 years ago

I think I found whats going wrong here. After image is loaded and passed through trainHook the tensor needs to be reshaped from (3,224,224) to (1,3,224,224) , because the model:forward() usually takes 128 images of a batch at once as a (128,3,224,224) tensor input.

soumith commented 8 years ago

Thanks for pointing this out. Fixed the readme via commit https://github.com/soumith/imagenet-multiGPU.torch/commit/710bace7b0d0342b87a4d9ad590c0bec3d6de8ad