rakshithShetty / captionGAN

Source code for the paper "Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training"
MIT License
66 stars 17 forks source link

How to use .npz files from ResNet feature extractor #1

Open pratyushmore opened 6 years ago

pratyushmore commented 6 years ago

Hi Rakshith,

Thank you for putting up your code! It is extremely helpful.

I used the link (https://github.com/akirafukui/vqa-mcb/tree/master/preprocess) in your README to extract ResNet features for use in the adversarial training.

The extractor gave me a large number of .npz files. Your example makes it seem as though everything should be in one file. I am thus wondering how I should specify to the program which files to use as image features.

rakshithShetty commented 6 years ago


I usually prefer to concat all features into one large npy file (Nxfeat dimensions). But current code also supports list of numpy files. Just create a text file with list of feature files and name it as .npzl . If the extension of the input feature file is is '.npzl', the data provider parses it as a list of numpy files. Take a look at line 666 in imagernn/data_provider.py and the function called there.

pratyushmore commented 6 years ago

Hi Rakshit,

Thank you for your response. Essentially, I just need to create a list of strings specifying all the npz files?

Also, should this include the train, and val files? Or should these be in three separate files?

pratyushmore commented 6 years ago

I also see the auxiliary input files. What do these entail? I do not quite understand how to create them.

pratyushmore commented 6 years ago

I tried running without the aux file, setting evaluation to image features. Got the following error:

Traceback (most recent call last):
  File "train_adversarial_caption_gen_v2.py", line 663, in <module>
  File "train_adversarial_caption_gen_v2.py", line 241, in main
    generator.model_th, params, xI = xI, xAux = xAux)
  File "/home/pmore/code/captionGAN/imagernn/lstm_generatorTheano.py", line 349, in build_prediction_model
    accLogProb, Idx, wOut_emb, updates, seq_lengths = self.lstm_advers_gen_layer(tparams, embImg, xAuxEmb, options, prefix=options['generator'])
  File "/home/pmore/code/captionGAN/imagernn/lstm_generatorTheano.py", line 592, in lstm_advers_gen_layer
    tparams[_p(prefix,'W_aux')]), n_samp,axis=0)
KeyError: 'lstm_W_aux'
pratyushmore commented 6 years ago

@rakshithShetty Was just wondering if you had seen these. I am working on a research paper for which I would like to cite your paper. Would appreciate your help in running your model.

rakshithShetty commented 6 years ago

Hi, Sorry for the delayed response. I use the image_feature_file and aux_inp_file to provide two different features for the captioning LSTM. In our paper we used two features, CNN features extracted from resnet and object detection features from Faster RCNN (fasterRcnn_clasDetFEat80.npy in the example command). I have now uploaded this feature file to the google drive since it's a small file. You can use it if you wish. (https://drive.google.com/drive/folders/0B76QzqVJdOJ5TV9FMjhpVmlsTFE) To see the details of how this feature is extracted refer to section 3.1 of (https://dl.acm.org/citation.cfm?id=2983571)

Since I have been using both feature files recently some parts of the code has not been guarded properly for the absence of the aux input. That's the error you are seeing.

rakshithShetty commented 6 years ago

Between, I have pushed the faster rcnn codebase I used to extract this feature vector here. It only has bare-bones usage instructions for now.


pratyushmore commented 6 years ago

Hi Rakshith.

Thank you so much. I set the feature file to the file you uploaded. And set the aux inp file to the resnet features.

I get the following error:

Traceback (most recent call last):
  File "train_adversarial_caption_gen_v2.py", line 663, in <module>
  File "train_adversarial_caption_gen_v2.py", line 187, in main
    dp = getDataProvider(params)
  File "/home/pmore/code/captionGAN/imagernn/data_provider.py", line 597, in getDataProvider
    return BasicDataProvider(params)
  File "/home/pmore/code/captionGAN/imagernn/data_provider.py", line 94, in __init__
    imgIdSet = allIdSet)
  File "/home/pmore/code/captionGAN/imagernn/data_provider.py", line 629, in loadFromLbls
    features, hdf5Flag = readFeaturesFromFile(features_path, idxes = feat_load_list)
  File "/home/pmore/code/captionGAN/imagernn/data_provider.py", line 814, in readFeaturesFromFile
    features, hdf5_flag = loadSingleFeat(filename, idxes, mat_new_ver)
  File "/home/pmore/code/captionGAN/imagernn/data_provider.py", line 797, in loadSingleFeat
    return features, hdf5_flag
UnboundLocalError: local variable 'features' referenced before assignment

Any idea as to why this is happening?

pratyushmore commented 6 years ago


I used the faster_RCNN features as the aux input. I used the ResNet features as feature file input. Also used disk_feature = 1.

After these changes, data reading seems to work.

I am now getting the following error:

Traceback (most recent call last):
  File "train_adversarial_caption_gen_v2.py", line 663, in <module>
  File "train_adversarial_caption_gen_v2.py", line 399, in main
    tsc_max, tsc_mean, tsc_min = eval_gen_samps(f_gen_only, dp, params, misc, params['rev_eval'], **trackMetargs)
  File "train_adversarial_caption_gen_v2.py", line 47, in eval_gen_samps
    g_out = gen_func(*inp)
  File "/home/pmore/anaconda2/envs/caption_gan/lib/python2.7/site-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/pmore/anaconda2/envs/caption_gan/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/pmore/anaconda2/envs/caption_gan/lib/python2.7/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
  File "pygpu/blas.pyx", line 47, in pygpu.blas.pygpu_blas_rgemm
pygpu.gpuarray.GpuArrayException: ('mismatched shapes', 2)
Apply node that caused the error: GpuDot22(GpuFromHost<None>.0, WIemb_aux)
Toposort index: 23
Inputs types: [GpuArrayType<None>(float32, matrix), GpuArrayType<None>(float32, matrix)]
Inputs shapes: [(100, 80), (2048, 512)]
Inputs strides: [(320, 4), (2048, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuElemwise{Add}[(0, 0)]<gpuarray>(GpuDot22.0, InplaceGpuDimShuffle{x,0}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "train_adversarial_caption_gen_v2.py", line 663, in <module>
  File "train_adversarial_caption_gen_v2.py", line 241, in main
    generator.model_th, params, xI = xI, xAux = xAux)
  File "/home/pmore/code/captionGAN/imagernn/lstm_generatorTheano.py", line 338, in build_prediction_model
    xAuxEmb = tensor.dot(xAux,tparams['WIemb_aux']) + tparams['b_Img_aux']
  File "train_adversarial_caption_gen_v2.py", line 663, in <module>
  File "train_adversarial_caption_gen_v2.py", line 241, in main
    generator.model_th, params, xI = xI, xAux = xAux)
  File "/home/pmore/code/captionGAN/imagernn/lstm_generatorTheano.py", line 338, in build_prediction_model
    xAuxEmb = tensor.dot(xAux,tparams['WIemb_aux']) + tparams['b_Img_aux']
Noahsark commented 6 years ago

Hi, Rakshith, May I ask if the order of features in .npz file has to be exactly same with the order in your "labels.txt" mapping file?
