luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
BSD 3-Clause "New" or "Revised" License
193 stars 31 forks source link

Questions about h5py.File features on customized images! #2

Open liman13552763129 opened 3 years ago

liman13552763129 commented 3 years ago

Hi, in your coding,h5py.File features has keys like ['%d_features' % image_id] , ['%d_grids' % image_id], ['%d_boxes' % image_id], ['%d_size' % image_id], ['%d_mask' % image_id], can you explain the five keys meaning?
I know you extract the grid features on 'https://github.com/facebookresearch/grid-feats-vqa', can you share the other keys(features, boxes,size,mask) extraction method or how to obtain them? Thanks! hope you reply soon!

luo3300612 commented 3 years ago

Thanks for your interest. As you can see, there are five kinds of keys in our .hdf5 file. They are

The first three keys can be obtained when extracting region features. The last key can be obtained according to geometric relationship between grid features and region features. I will upload the extraction method and update our README to explain it more clearly. Thanks for your asking!

liman13552763129 commented 3 years ago

Hi, the forth key ['%d_grids' % image_id]: grid features (N_grids, feature_dim), where the dim N_grids is 77 and feature_dim is 2048? The grids features dim obtained in grid-feats-vqa is torch.Size([1, 2048, 26, 19]) ,after using torch.nn.AdaptiveAvgPool2d((7, 7)) the features dim is torch.Size([1, 2048, 7, 7]). To get torch.Size([77, 2048]) grid features from torch.Size([1, 2048, 7, 7]) (marked as A), I think it need three steps:

the conversion above is right? If it's not right, how it that?

liman13552763129 commented 3 years ago

@luo3300612 hi, hope you reply soon! thank you very much!

luo3300612 commented 3 years ago

yes, it is right

liman13552763129 commented 3 years ago

thanks!

liman13552763129 commented 3 years ago

@luo3300612 Hi,when I start to optimize the model with CIDEr reward with 510-6 learning rate, the loss is 0 at begining , after several batch chage negative value, in your code the loss is: loss = -torch.mean(log_probs, -1) (reward - reward_baseline) I print the log_probs,reward and reward_baseline, the log_probs is negative value and others is positive value.Is it right? And why causes this phenomenon(loss is is negative value)? Hope you reply soon! thank you very much!

liman13552763129 commented 3 years ago

@luo3300612 hi,sorry bother you again, hope you reply for the above question, thank you very much!

z972778371 commented 2 years ago

@luo3300612 您好,再次打扰您,希望您回复以上问题,非常感谢!

您好,我想咨询您一下,您有相关代码去生成一个图像的描述像论文中的图1或者图5吗?如果有的话能和我分享一下吗?谢谢!!

cxy990729 commented 2 years ago

Hi,I wonder how to make my own datasets,could you provide me with the script to extract the five keys ? @liman13552763129

YinghuaYa commented 1 year ago

Hi!Could you share the copy. h5py file you downloaded earlier? Now the zip file in the link is damaged? In addition, if I want to operate on my own dataset, how do I get this h5py file? @luo3300612 @liman13552763129 @cxy990729