texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
433 stars 85 forks source link

Problem with contrastive loss in pretrain stage #94

Open tien-ngnvan opened 8 months ago

tien-ngnvan commented 8 months ago

Thanks for your great work. I meet the problem when using the same hyperparameters in NQ example pre-train on the second stage like coCondenser (we call uptrain stage with contrastive loss). Our template includes 1 query, 1 positive and 10 negative passages with our custom dataloader using a streaming mode dataset (dataset includes two languages with 25M triplet samples), our model based on bert-base-multilingual-cased has been continuing pretrain with MLM loss curve. It seems pre-train on contrastive loss can not be converged, here is the training script

python -m torch.distributed.launch --nproc_per_node=8 -m asymmetric.train \
    --model_name_or_path 'asymmetric/checkpoint-10000' \
    --streaming \
    --output $saved_path \
    --do_train \
    --train_dir 'data/train' \
    --max_steps 10000 \
    --per_device_train_batch_size 32 \
    --dataset_num_proc 2 \
    --train_n_passages 8 \
    --gc_q_chunk_size 8 \
    --gc_p_chunk_size 64 \
    --untie_encoder \
    --negatives_x_device \
    --learning_rate 5e-4 \
    --weight_decay 1e-2 \
    --warmup_ratio 0.1 \
    --save_steps 1000 \
    --save_total_limit 20 \
    --logging_steps 50 \
    --q_max_len 128 \
    --p_max_len 384 \
    --fp16 \
    --report_to 'wandb' \
    --overwrite_output_dir

image

luyug commented 6 months ago

To understand this better, can you elaborate on what hardware is used?