Hi! I've observed the following when using Unsloth.
Summary
When fine-tuning the Unsloth Phi-3.5 model with LoRA, the trainable parameters are approximately 3x higher compared to the Microsoft Phi-3.5 implementation, despite using identical hyperparameters and target modules.
Hi! I've observed the following when using Unsloth.
Summary
When fine-tuning the Unsloth Phi-3.5 model with LoRA, the trainable parameters are approximately 3x higher compared to the Microsoft Phi-3.5 implementation, despite using identical hyperparameters and target modules.
Details
Microsoft Phi-3.5 Model
Using the following configuration:
Output:
Unsloth-Based Setup
Configuration:
Output:
As you can see, there is a huge discrepancy in the number of trainable parameters.
Am I doing something wrong or is this unintentional behaviour?
Thank you for your ongoing work!