When running with LoftQ, performance worsens with TinyLlama versus QLoRA. The performance gets even worse when I do more iterations for initiating the LoftQ adapters. (my grad norm gets worse the more iterations I do).
Is there any reason why applying loftq wouldn't work with TinyLlama?
When working with Mistral, I found that:
1 iteration of LoftQ is slightly better than QLoRA.
but 3 iterations of LoftQ is worse than QLoRA.
I'm using a rank of 32 and alpha of 32 as well. My base learning rate is 1e-4 .
I am using unsloth:
if use_4bit and config['use_loftq']:
loftq_config = LoftQConfig(loftq_bits=4, loftq_iter=1)
init_lora_weights = "loftq"
else:
loftq_config = None
init_lora_weights = True
## Apply LoRA (if use_lora is True in the config)
if config.get('use_lora', False):
model = FastLanguageModel.get_peft_model(
model,
r=config['lora_r'],
lora_alpha=config['lora_alpha'],
target_modules=config['lora_modules'],
modules_to_save=config.get('other_trainable', None),
lora_dropout = 0, # Dropout = 0 is currently optimized
bias = "none", # Bias = "none" is currently optimized
use_gradient_checkpointing = True,
random_state = 3407,
use_rslora=True,
loftq_config=loftq_config,
init_lora_weights=init_lora_weights,
)
When running with LoftQ, performance worsens with TinyLlama versus QLoRA. The performance gets even worse when I do more iterations for initiating the LoftQ adapters. (my grad norm gets worse the more iterations I do).
Is there any reason why applying loftq wouldn't work with TinyLlama?
When working with Mistral, I found that:
I'm using a rank of 32 and alpha of 32 as well. My base learning rate is 1e-4 .
I am using unsloth: