Questions about h5py.File features on customized images!

luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).

BSD 3-Clause "New" or "Revised" License

193 stars 31 forks source link

Questions about h5py.File features on customized images! #2

Open liman13552763129 opened 3 years ago

liman13552763129 commented 3 years ago

Hi, in your coding,h5py.File features has keys like ['%d_features' % image_id] , ['%d_grids' % image_id], ['%d_boxes' % image_id], ['%d_size' % image_id], ['%d_mask' % image_id], can you explain the five keys meaning？
I know you extract the grid features on 'https://github.com/facebookresearch/grid-feats-vqa', can you share the other keys(features, boxes,size,mask) extraction method or how to obtain them? Thanks! hope you reply soon！

luo3300612 commented 3 years ago

Thanks for your interest. As you can see, there are five kinds of keys in our .hdf5 file. They are

['%d_features' % image_id]: region features (N_regions, feature_dim)
['%d_boxes' % image_id]: bounding box of region features (N_regions, 4)
['%d_size' % image_id]: size of original image (for normalizing bounding box), (2,)
['%d_grids' % image_id]: grid features (N_grids, feature_dim)
['%d_mask' % image_id]: geometric alignment graph, (N_regions, N_grids)

The first three keys can be obtained when extracting region features. The last key can be obtained according to geometric relationship between grid features and region features. I will upload the extraction method and update our README to explain it more clearly. Thanks for your asking!

liman13552763129 commented 3 years ago

Hi, the forth key ['%d_grids' % image_id]: grid features (N_grids, feature_dim), where the dim N_grids is 77 and feature_dim is 2048? The grids features dim obtained in grid-feats-vqa is torch.Size([1, 2048, 26, 19]) ,after using torch.nn.AdaptiveAvgPool2d((7, 7)) the features dim is torch.Size([1, 2048, 7, 7]). To get torch.Size([77, 2048]) grid features from torch.Size([1, 2048, 7, 7]) (marked as A), I think it need three steps:

firstly: using torch.squeeze(A) to get torch.Size([2048, 7, 7]) (marked as B)
secondly: B.reshape([2048, 7*7])(marked as C)
thirdly: C.transpose(0,1) get the torch.Size([7*7, 2048])

the conversion above is right? If it's not right, how it that?

liman13552763129 commented 3 years ago

@luo3300612 hi， hope you reply soon！ thank you very much!

luo3300612 commented 3 years ago

yes, it is right

liman13552763129 commented 3 years ago

thanks!

liman13552763129 commented 3 years ago

@luo3300612 Hi，when I start to optimize the model with CIDEr reward with 510-6 learning rate, the loss is 0 at begining , after several batch chage negative value, in your code the loss is: loss = -torch.mean(log_probs, -1) (reward - reward_baseline) I print the log_probs,reward and reward_baseline, the log_probs is negative value and others is positive value.Is it right? And why causes this phenomenon（loss is is negative value）？ Hope you reply soon！ thank you very much!

liman13552763129 commented 3 years ago

@luo3300612 hi，sorry bother you again, hope you reply for the above question, thank you very much!

z972778371 commented 2 years ago

@luo3300612 您好，再次打扰您，希望您回复以上问题，非常感谢！

您好，我想咨询您一下，您有相关代码去生成一个图像的描述像论文中的图1或者图5吗？如果有的话能和我分享一下吗？谢谢！！

cxy990729 commented 2 years ago

Hi，I wonder how to make my own datasets，could you provide me with the script to extract the five keys ? @liman13552763129

YinghuaYa commented 1 year ago

Hi！Could you share the copy. h5py file you downloaded earlier? Now the zip file in the link is damaged? In addition, if I want to operate on my own dataset, how do I get this h5py file? @luo3300612 @liman13552763129 @cxy990729