testing of pre-trained model on different test dataset

arslansadiq commented 5 years ago

Hi, I hope that you are doing ok.

Background: I am working on uncertainty estimation so, I need to test models trained on one dataset (say MNIST) on test data from different datasets(say Fashion MNIST, NOTMINIST).

Problem: I have a question regarding testing of pre-trained model saved at <saved/[model_name]/model_best> on test data from different dataset. For Example say, I trained my MNIST model on MNIST dataset and I want to test that trained model through "model_best.pth" saved in <saved/[model_name]> on FashionMNIST test data. How can I do that?

Upon testing I am adding resume argument to model_best which I believe probably has information regarding the dataset for testing. Changing the config.json file's dataset path in <saved/[model_name]> doesn't do any good too. It always goes to the dataset model was trained on. Would be kind enough to tell me how can i go around this problem?

SunQpark commented 5 years ago

Maybe there are several solutions possible on this.

I think switching the dataset by using training argument of data_loader class would be simplest one.

change line 14 to 15 of data_loader.py which is

self.data_dir = data_dir
self.dataset = datasets.MNIST(self.data_dir, train=training, ...)

into

self.data_dir = data_dir
if training:
    self.dataset = datasets.MNIST(self.data_dir, train=training, ...)
else:
    self.dataset = datasets.FashionMNIST(self.data_dir, train=training, ...)

But in this case FashionMNIST will just load files from self.data_dir, which contains MNIST. That is because those data sets have identical format, making pytorch dataset class can't tell them apart. So, we should also change the directories where datasets are saved here.

if training:
    self.data_dir = os.path.join(data_dir, 'mnist')
    self.dataset = datasets.MNIST(self.data_dir, train=True, ...)
else:
    self.data_dir = os.path.join(data_dir, 'f_mnist')
    self.dataset = datasets.FashionMNIST(self.data_dir, train=False, ...)

This should work, but I did not tested yet.

arslansadiq commented 5 years ago

This is absolutely going to work. I will test it first thing when I'll be free. Earlier I coded it explicitly in test.py file where it is instantiating data_loader(line 14-21) like this:

data_loader = module_data.FashionMNIST("data/", batch_size=512, shuffle=False, validation_split=0.0, training=False, num_workers=2 )

FashionMNIST is my data loader class in "data_loader/data_loaders.py".

I have couple other data loader modules there as well, and I need to cross test every datatset's test data with every other dataset's. So that means if I follow the solution you suggested then I'll need to pass some variable in to every module data_loaders.py and then put if else conditions or switch right?

I was looking for some solution like, manipulating it from model_best.pth or config.json file, if that's possible.

SunQpark commented 5 years ago

The config.json is saved into the saved/[model name], but there is another copy of config which is saved inside of model checkpoint (pth file). 'resume' option will load configuration not from the json file but the pth file, which makes those changes in json file ignored. I agree that this can be confusing a little bit and will try to find if there is simpler way of doing this.

victoresque / pytorch-template

testing of pre-trained model on different test dataset #35