Open arunikayadav42 opened 4 years ago
@arunikayadav42 Check out the extract_image_features.py script in the STT repo. It has the necessary calls for preprocessing input images. The backbone specific preprocessor implementations are in the preprocessing directory
@peri044 I had a doubt regarding generating the parapharses. In the README file you have mentioned that we create train_enc.txt and train_dec.txt using the captions_train2014.json file. Then how are those captions mapped to the corresponding image in the train.npy features from the SCAN repository.
@arunikayadav42 I don't remember the exact data structure details of SCAN data as it has been a while. The way I create paraphrases (train_enc.txt and train_dec.txt) is here . The gist of the process is each image (with an image ID) has 5 captions and you have 20 combinations of sentences tied to the image ID. Using the same image ID, you can extract the SCAN features (downloaded from their repository) for the corresponding image which can be tied to combinations of captions.
@peri044 so my only questions is that when we have the 20 combinations and we go on to store them into the tf record files then for each of these combinations we need to have the corresponding image feature and all of them get store to the tfrecord file. Isn't it?
For instance if the image id is coco_train_1 , then the feature from scan data for this image id will be clubbed with each of the 20 combinations for the captions of this image , right?
So at this line https://github.com/peri044/STT/blob/master/data/coco_data_loader.py#L105 . should it not be (img_idx 20, img_idx 20 + 20) instead of (img_idx 5, img_idx 5 + 5)
Yes. The image feature (for the image id) is replicated for each of the 20 combinations of the captions.
Probably, the data loader script you linked is not the one I used during my experiments. Currently the data loader scripts are all over the place in data
folder. I don't remember the exact ones I used due to quick experimentation. You can probably refer to https://github.com/peri044/STT/blob/master/data/coco_extras/coco_feat_stt.py#L50 which writes an image feature for every sentence combination in a tfrecord.
All the modules for data loader/TF record generation are in the data
directory. They aren't well organized on a model basis (eg: stt
, stt-att
, scan
etc). However, all the components that are used in the experiments of the paper can be found (scattered) in the data
directory.
Hi @peri044 I wanted to train the STT network with my data. I want to preprocess the images. Can you please point me out to the necessary python script that can help me do the same?