huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
https://arxiv.org/pdf/2311.18445.pdf
Other
226 stars 11 forks source link

About lora duplication #19

Open yeppp27 opened 7 months ago

yeppp27 commented 7 months ago

Hello! I followed your method to load different Lora modules at different stages: model.get_model().initialize_vision_modules(model_args) model = load_lora(model, model_args.stage2_path) rank0_print('Merging LoRA weights...') model = model.merge_and_unload()

print_trainable_parameters(model)

rank0_print("Adding LoRA adapters...") model = get_peft_model(model, lora_config)

but when I print the parameters, I still only see one Lora. Is there any trick in the code settings that I might be missing?

image
huangb23 commented 7 months ago

I have tried printing the trainable parameters at the specified location as you mentioned. However, after calling merge_and_unload(), the LoRA adapters are not visible, contrary to what you've shown.

Could you please provide guidance on how to reproduce it?

yeppp27 commented 7 months ago

Thanks for your kind replying! This is way of printing my trainable parameters:

for name, param in model.named_parameters():
    print(f"{name}: {'requires_grad' if param.requires_grad else 'no grad'}")

It can print under the zero2 mode of deepspeed. Hope it is helpful to you.

huangb23 commented 7 months ago

Can you start the training for stage 3 ? If the LoRA at this location hasn't been merged, then the subsequent line model = get_peft_model(model, lora_config) will throw an error.

yeppp27 commented 7 months ago

Thanks for replying! It seems I did'nt load the second lora in the line of ''model = get_peft_model(model, lora_config)" step2 trainable params: 0 || all params: 3430316576 || trainable%: 0.00 Adding LoRA adapters... step3 trainable params: 248061952 || all params: 3430316576 || trainable%: 7.23

The all parameters does not change. TAT

huangb23 commented 7 months ago

The implementation of the print_trainable_parameters function seems to have a bug. When using DeepSpeed, param.numel() might return 0. I haven't found a solution for it yet. I'd appreciate any suggestions to address this. Nevertheless, this issue shouldn't hinder the training process.

yeppp27 commented 7 months ago

It can print param under the zero2 mode of deepspeed. Hope it is helpful to you