wang-zhanyu / R2GenGPT

Radiology Report Generation with Frozen LLMs
BSD 3-Clause "New" or "Revised" License
45 stars 4 forks source link

Deep not learning #2

Closed sergiotasconmorales closed 10 months ago

sergiotasconmorales commented 11 months ago

Hello, I managed to run SHALLOW and DELTA, but when I run DEEP with

python -u train.py \
    --dataset ${dataset} \
    --annotation ${annotation} \
    --base_dir ${base_dir} \
    --batch_size 4 \
    --val_batch_size 6 \
    --freeze_vm False \
    --vis_use_lora False \
    --llm_use_lora False \
    --savedmodel_path ${savepath} \
    --learning_rate 1e-4 \
    --gradient_clip_val 1 \
    --max_length 100 \
    --min_new_tokens 80 \
    --max_new_tokens 120 \
    --repetition_penalty 2.0 \
    --length_penalty 2.0 \
    --num_workers 4 \
    --devices 4 \
    --max_epochs 5 \
    --limit_val_batches 0.5 \
    --val_check_interval 0.5 \
    --num_sanity_val_steps 2 \
    --strategy "ddp"\
    --low_resource True \
    2>&1 |tee -a ${savepath}/log.txt

The metrics do not improve. At epoch 4 the values are {'Bleu_1': 0.05970959847989339, 'Bleu_2': 6.25814724399501e-11, 'Bleu_3': 6.408181401962741e-14, 'Bleu_4': 2.063290001168864e-15, 'ROUGE_L': 0.059111853619422554, 'CIDEr': 0.0006551277647106657}

As far as I know the only change in the config with respect to SHALLOW is freeze_vm=False

Do you use warm-up iterations for this case or something? Could you please help me?

wang-zhanyu commented 11 months ago

Hi, thanks for your interest. I didn't use the warm-up strategy, the training method is the same as Shallow and Delta. Is your training loss converging normally?

sergiotasconmorales commented 11 months ago

Hello, thank you for replying. I'm attaching the beginning of the metrics.csv file produced by your code. It looks like the loss starts low, then has some NaNs and then oscillates around 10.0 for the rest of the training.

I tried the code with the configuration mentioned above but with freeze_vm True and it works well (becomes shallow I guess). So it seems like somehow unfreezing the weights of the Swin transformer is what causes the issue.

metrics

wang-zhanyu commented 11 months ago

Hi, sergio. The loss isn't settling down, likely because there are 'nan' values in it. I suspect this might have something to do with the low_resource setting. In my own tests, I set low_resource to False and kept all other settings the same as yours. Could you try the following configuration to see if it resolves the issue? Your RTX3090 shoule be able to run it.

python -u train.py \ --dataset ${dataset} \ --annotation ${annotation} \ --base_dir ${base_dir} \ --batch_size 4 \ --val_batch_size 6 \ --freeze_vm False \ --vis_use_lora False \ --llm_use_lora False \ --savedmodel_path ${savepath} \ --learning_rate 1e-4 \ --gradient_clip_val 1 \ --max_length 100 \ --min_new_tokens 80 \ --max_new_tokens 120 \ --repetition_penalty 2.0 \ --length_penalty 2.0 \ --num_workers 4 \ --devices 4 \ --max_epochs 5 \ --limit_val_batches 0.5 \ --val_check_interval 0.5 \ --num_sanity_val_steps 2 \ --strategy "ddp"\ --global_only True \ --low_resource False \ 2>&1 |tee -a ${savepath}/log.txt

sergiotasconmorales commented 11 months ago

Hi, thanks for the reply. Unfortunately, the RTX3090 can't handle even shallow nor delta with low_resource set to False. I know in theory it should, considering Llama stays frozen. But I tried and I get the CudaOutOfMemory error every time. I'm still trying to make it work for deep with low_resource=True. If you have any suggestions about what I could try to change in order to solve the problem, please let me know. Thanks

sergiotasconmorales commented 11 months ago

Hi, I tried again and the model runs with low_resource=False. Now deep seems to be learning. For anyone who runs into issues related to the location of tensors, please check out my fork of this repo. I added an autocast line in the validation step method, which seems to have solved the issue. But am not sure if that was the only change that the code required to solve the issue, so check out the fork if you have issues to make the code run on smaller GPUs. Thank you, authors, for sharing the code.

sergiotasconmorales commented 10 months ago

Dear Autor, can you comment on how necessary it is to have global_only==True in DEEP? In the sh file in your repo the parameter is not set (default is False), but in the config you shared above it is set to True. Is it necessary to set it to True? I managed to train DEEP successfully only once, with results lower than those of SHALLOW and DELTA. It looks like the training is not stable, as NaNs appear again in the loss. All that I'm describing here is with low_resource=False.

wang-zhanyu commented 10 months ago

Hi, sergio. When setting global_only to True, the sequence length of LLM input will be reduced, which can save some GPU memory. I have tested this configuration on a 3090 GPU and it works. This is not necessary when you have GPUs with larger memory.

As for the NaNs issue, I haven't encountered it yet, but I think it might be due to insufficient precision causing an overflow. You can try mixed-precision training (by setting precision to bf16-mixed) or full-precision training (by setting precision to 32) to see if that helps.

sergiotasconmorales commented 10 months ago

Thanks, I will try changing the precision

sergiotasconmorales commented 10 months ago

Hi, changing the precision seems to have solved the issue. I am currently trying to transform the model into a VQA model, without much success. Fine-tunining on the VQA dataset using the question as prompt doesn't work well, since the model tends to provide longer outputs. I tried changing min_new_tokens to 1 and max_new_tokens to 15, and that improved the answers, but not to an acceptable level. Do you have any idea of what generation parameters would need to be changed in order to encourage the model to provide shorter answers (most of the time one-word answers)?