jchenghu / ExpansionNet_v2

Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
https://arxiv.org/abs/2208.06551
MIT License
84 stars 24 forks source link

Hello,I have a question #17

Closed PanYuQi66666666 closed 2 months ago

PanYuQi66666666 commented 2 months ago

Regarding the Block Static Expansion mentioned in the paper, is it extending the input sequence to lengths of {32, 64, 128, 256, 512} and then performing calculations. But what I see in the code is adding up {32, 64, 128, 256, 512} to get 992. I am very confused, please provide me with a detailed answer to this part. thank you

jchenghu commented 2 months ago

Hi, we performed the sum operation during the forward expansion to allow parallelization and it does not affect the final result. In contrast, during the backward, each group is treated separately.

jchenghu commented 2 months ago

Hi, I'm assuming the aspect was clarified, feel free to re-open the issue if that's not the case