jpthu17 / DiCoSA

[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Apache License 2.0
44 stars 2 forks source link

Trainin on one gpu. #6

Closed adrianofragomeni closed 5 months ago

adrianofragomeni commented 5 months ago

Hello, Thank you for the repo and well done for the project.

I have a question on how and if it's possible to train on a single gpu.

jpthu17 commented 5 months ago

You can train on a single GPU using:

CUDA_VISIBLE_DEVICES=0 \
python -m torch.distributed.launch \
--master_port 2502 \
--nproc_per_node=1 \
main_retrieval.py \
--do_train 1 \
--workers 8 \
--n_display 50 \
--epochs 5 \
--lr 1e-4 \
--coef_lr 1e-3 \
--batch_size 128 \
--batch_size_val 128 \
--anno_path data/MSR-VTT/anns \
--video_path ${DATA_PATH}/MSRVTT_Videos \
--datatype msrvtt \
--max_words 32 \
--max_frames 12 \
--video_framerate 1 \
--output_dir ${OUTPUT_PATH} \
--center 8 \
--temp 3 \
--alpha 0.01 \
--beta 0.005

If running out of GPU memory, consider freezing the CLIP parameter:

for param in self.clip.parameters():
    param.requires_grad = False  # not update by gradient
adrianofragomeni commented 5 months ago

Thank you