Training on a single GPU

257556227 commented 2 years ago

Nice work and nice repository ! But I still have some doubts about the repository~

Can I run it on a single GPU? Although, I run your work(python fine_tune_mosi.py --config config_mosi.yaml) in two GPU, it'll always return CUDA out of memory.My GPU is Tesla V100, which shows RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 15.78 GiB total capacity; 14.18 GiB already allocated; 61.50 MiB free. Of course, I also tried to set the batchsize to 1 or 2. Unlike the train module, I don't quite understand why the finetune module needs such a large memory. Can you tell me how I should go about changing the batchsize or other modifications to support training on a single gpu? For example, how many epochs is recommended when trained on a single gpu, in order to reach pretraining convergence? I see in code, you always make opt as adamw, did you select apex with fp16 for training or inference? And are there other tricks in the training?
Can I still run the code after removing the code that calls the wandb interface? I'm so sorry!!! I'm a novice in deep learning and don't understand the built-in mechanism of wandb. Therefore, wandb cannot be used skillfully. If I try to submit tasks to a multi GPU cluster, I don't know how to enter the API Key, which will also lead to `wandb errors. UsageError: api key not configured (no-tty). call wandb. login(key=[your_api_key])`， so I only choose wandb: (3) Don't visualize my results.

I'm fairly new to this, and I appreciate the help. Thank you

Vvvvvvsysy commented 2 years ago

This will overwrite the hyperparameters in config_mosi.yaml， parser.add_argument('--batch_size', default=16, type=int, help='Set Batch Size') Modify this line and reduce the batchsize!

257556227 commented 2 years ago

Oh, Thanks a lot for the advice you give me. This is very useful to me.

Vvvvvvsysy commented 2 years ago

@@@客气，你的英文说的好好，能请教一下怎么学的吗？

257556227 commented 2 years ago

我就是半译半写的，哈哈，比不过大佬您哦，好久没看这个issue，回的晚了@Vvvvvvsysy

cameroncvpr commented 10 months ago

想请问您当时解决这个问题了吗？我也是单V100一直报错内存溢出

tjdevWorks / TEASEL

Training on a single GPU #3