jchenghu / ExpansionNet_v2

Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
https://arxiv.org/abs/2208.06551
MIT License
84 stars 24 forks source link

online testing CUDA overflow #21

Closed PanYuQi66666666 closed 1 month ago

PanYuQi66666666 commented 1 month ago

Hello, I use the Ensemble model to do online testing. I use a single A100-40G to display my CUDA overflow. Is there any good way to solve this problem? What should I do if I increase the number of A100-40Gs to two?

jchenghu commented 1 month ago

Hi, so you get CUDA Memory overflow in the case of a single GPU, I think the simplest solution would be reducing the --eval_parallel_batch_size which defaults to 16. You may want to lower this value:

Something like

python test.py --N_enc 3 --N_dec 3 --model_dim 512 \
    ...
    --eval_parallel_batch_size 4 \
    ...

Or even lower if needed Let me know if it helps or you have already tried this solution

jchenghu commented 1 month ago

Hi, I'm closing the issue, I assume the it was solved based on the other thread