reedscot / cvpr2016

Learning Deep Representations of Fine-grained Visual Descriptions
http://arxiv.org/abs/1605.05395
MIT License
334 stars 97 forks source link

.t7 caption files #7

Open shenkev opened 7 years ago

shenkev commented 7 years ago

Hi Scot,

Quick question about the bird dataset you're using.

I downloaded the bird dataset as per your instructions:

#####How to train a char-CNN-RNN model:
1. Download the birds and flowers data.

Inside the cvpr2016_cub/text_c10 directory, there are .t7 files. E.G 200.Common_Yellowthroat.t7

Upon opening them, I found that they were 60x201x10 tensors of integers. I guessed 60 is the images/specie, 10 is the caption/image. What is the 201 dimension? Is it the vocabulary size of the captions? What are the actual integers? I notice values from 0 to 70ish with a lot of the values being 0.

GaryLMS commented 7 years ago

I think 201 is the length of the sentence, if the length is shorter than 201, it will pad zero, otherwise the sentence will be cut.

jayelm commented 5 years ago

Since there are only ~70 possible values, the actual integers here seem to be character indices. Not sure what the precise mapping is. For word-level encodings see the word_c10 directory (see #8).