RuntimeError: stack expects each tensor to be equal size

luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).

BSD 3-Clause "New" or "Revised" License

193 stars 31 forks source link

RuntimeError: stack expects each tensor to be equal size #35

Closed competent-s closed 2 years ago

competent-s commented 2 years ago

您好，请问您使用https://github.com/facebookresearch/grid-feats-vqa提取的特征维度是固定的吗？我使用[extract_grid_feature.py]提取得到的特征维度不同，所以在batch size训练时，会出现RuntimeError: stack expects each tensor to be equal size, but got [1, 2048, 18, 32] at entry 0 and [1, 2048, 19, 29] at entry 1

luo3300612 commented 2 years ago

提取之后用pytorch的adaptive average pooling都pooling成7*7的就可以了

competent-s commented 2 years ago

但是提取的是四维的，我不太清楚怎么才能转换成（49,2048），能求一下您当时提取grid特征的源码吗？

luo3300612 commented 2 years ago

转化很简单，比方说先用AaptiveAvgPooling2D把[1,2048,18,32]变成[1,2048,7,7]，然后维度顺序换一下BCHW->BHWC 变成[1,7,7,2048]，然后前三维合并一下就是49*2048了

competent-s commented 2 years ago

谢谢您的回复，祝您生活愉快