showlab / UniVTG

[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
https://arxiv.org/abs/2307.16715
MIT License
315 stars 28 forks source link

Training Detail for Pretrain #24

Open EasonXiao-888 opened 1 year ago

EasonXiao-888 commented 1 year ago

Hello, thanks for your fancy work. I want to make sure that the pretrain model is verified on the val set of the QVHighlight dataset, ?and the ckpt is selected by comparing R1@0.3 ? What's more,could you please share the log file for pretraing?

QinghongLin commented 12 months ago

@EasonXiao-888 Yes, during pretraining, I use the zero-shot QVhighlight results to monitor the training stage. The ckpt should be selected by mAP, which is more comprehensive than mAP avg. But for downstream tasks, I will suggest you try different ckpt on different benchmarks (e.g., zero-shot) to get the optimal one.

Sure, I can share you the log, but might need few days to retrieve it. Please sent me email if I do not response in time.

EasonXiao-888 commented 12 months ago

Okay , thanks a lot. But there is an additional question. When we use "Curve" data to perform pretrain on A100, it cannot be started due to CPU memory problems. Have you encountered this problem?

QinghongLin commented 12 months ago

I think this may due to the cache option --use_cache, it will try to load the whole pretraining corpus into memory, can you try to remove it in your training script

RobertLuo1 commented 11 months ago

I encounter the same problem too. I did not use the cache and when I load the Curve data the num_workers should only set to be 0. Otherwise it will encounter the problem. But setting the num_workers to 0, the programme will be quite slow.

QinghongLin commented 11 months ago

@EasonXiao-888 @RobertLuo1 Can you provide me the error output with details and the matched code line for better understanding? thanks you

Aarontncl commented 8 months ago

Same problem here. My program has always got stuck when loading the "curve_5_window.jsonl" file into the dataset. I used DDP and have tried to set num_workers=0, but it still didn't work. I wonder what was the cpu hardware environment being used for the pretraining. It seems that the pretraining has a very high cpu hardware requirement. Thank you. @QinghongLin