Open shenkev opened 7 years ago
I think 201 is the length of the sentence, if the length is shorter than 201, it will pad zero, otherwise the sentence will be cut.
Since there are only ~70 possible values, the actual integers here seem to be character indices. Not sure what the precise mapping is. For word-level encodings see the word_c10
directory (see #8).
Hi Scot,
Quick question about the bird dataset you're using.
I downloaded the bird dataset as per your instructions:
Inside the cvpr2016_cub/text_c10 directory, there are .t7 files. E.G
200.Common_Yellowthroat.t7
Upon opening them, I found that they were 60x201x10 tensors of integers. I guessed 60 is the images/specie, 10 is the caption/image. What is the 201 dimension? Is it the vocabulary size of the captions? What are the actual integers? I notice values from 0 to 70ish with a lot of the values being 0.