reedscot / cvpr2016

Learning Deep Representations of Fine-grained Visual Descriptions
http://arxiv.org/abs/1605.05395
MIT License
334 stars 97 forks source link

What the vocab_c10.t7(in cub data) is used for? #8

Open jingliao132 opened 6 years ago

jingliao132 commented 6 years ago

Hello scot! I am confused at how to wrap up .h5 files from .txt(in folder text_c10) when checking the cub data downloading from the link you provided. I open the .h5 file, I found the keys are like 'txt1', 'txt2',..., 'txt10'. Since there are exactly 10 text descriptions in each .txt file, I guess each key value should be corresponding to a text description in .txt. Next, I check the key value of 'txt1', it is a one dimensional tensor with shape (90,): [116, 104, 101, ..., 46]. The 'txt2' shape is (76,). The '90', '76' is very close to the number of alphabets in each text description. I guess the one-dimensional tensor is encoded from an alphabet list(character-level). However, the vocab_c10.t7 is a dictionary contains many words(word-level). It is really weird. How do you encode each text description from .txt to .h5 file? and how do you generate .t7 files(6020110 DoubleTensor) under /text_c10?

jayelm commented 5 years ago

Correct, the files in text_c10 seem to contain character-level encoding of descriptions. There are ~70 possible integer values, which corresponds perhaps to a lowercase + uppercase alphabet and punctuation. I haven't been able to figure out where the character -> index mapping lives...or if we have to figure it out ourselves...

On the other hand the files in word_c10 contain word-level descriptions of encodings. The vocab_c10.t7 file maps from words to these integers (confirmed by manually translating some of the vectors in word_c10).