Thank you for your incredible work on this project.
I am encountering an issue during inference. When I use the non-LoRA weights for inference on ScienceQA, the speed is approximately 1 second per sample. However, when I switch to the LoRA fine-tuned model, the inference speed drastically increases to over 40 seconds per sample.
Here is the command I am using for fine-tuning (trained on 1 V100 with lora_r=4, bf16=False, tf32=False):
Hi Haotian,
Thank you for your incredible work on this project.
I am encountering an issue during inference. When I use the non-LoRA weights for inference on ScienceQA, the speed is approximately 1 second per sample. However, when I switch to the LoRA fine-tuned model, the inference speed drastically increases to over 40 seconds per sample.
Here is the command I am using for fine-tuning (trained on 1 V100 with lora_r=4, bf16=False, tf32=False):
Here is the command I am using for inference:
Could you please help me understand why the inference speed difference between the two models is significant?
Thank you!
Screenshots:
adapter_config.json:
config.json