jvoas655 / AViD-SP

A method to generate visual scene graphs conditioned on images paired with iterative spoken utterances, produced alongside the VG-SPICE dataset.
GNU General Public License v3.0
0 stars 0 forks source link

Inquiry on GPU Requirements #1

Open NingJinzhong opened 4 months ago

NingJinzhong commented 4 months ago

Hi, could you let me know how much VRAM is needed to train your model? Thanks!

jvoas655 commented 4 months ago

We train on multiple L40 or A40 GPUs (48GB each) but find that a single 48GB GPU is sufficient to train the model, given enough time. We have not tested with lower VRAM GPUs, but lower batch sizes may be feasible.