luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
BSD 3-Clause "New" or "Revised" License
193 stars 31 forks source link

RuntimeError: stack expects each tensor to be equal size #35

Closed competent-s closed 2 years ago

competent-s commented 2 years ago

您好,请问您使用https://github.com/facebookresearch/grid-feats-vqa提取的特征维度是固定的吗?我使用[extract_grid_feature.py]提取得到的特征维度不同,所以在batch size训练时,会出现RuntimeError: stack expects each tensor to be equal size, but got [1, 2048, 18, 32] at entry 0 and [1, 2048, 19, 29] at entry 1

luo3300612 commented 2 years ago

提取之后用pytorch的adaptive average pooling都pooling成7*7的就可以了

competent-s commented 2 years ago

但是提取的是四维的,我不太清楚怎么才能转换成(49,2048),能求一下您当时提取grid特征的源码吗?

luo3300612 commented 2 years ago

转化很简单,比方说先用AaptiveAvgPooling2D把[1,2048,18,32]变成[1,2048,7,7],然后维度顺序换一下BCHW->BHWC 变成[1,7,7,2048],然后前三维合并一下就是49*2048了

competent-s commented 2 years ago

谢谢您的回复,祝您生活愉快