Performance worsens versus QLoRA with TinyLlama

When running with LoftQ, performance worsens with TinyLlama versus QLoRA. The performance gets even worse when I do more iterations for initiating the LoftQ adapters. (my grad norm gets worse the more iterations I do).

Is there any reason why applying loftq wouldn't work with TinyLlama?

When working with Mistral, I found that:

1 iteration of LoftQ is slightly better than QLoRA.
but 3 iterations of LoftQ is worse than QLoRA.

I'm using a rank of 32 and alpha of 32 as well. My base learning rate is 1e-4 .

I am using unsloth:

if use_4bit and config['use_loftq']:
    loftq_config = LoftQConfig(loftq_bits=4, loftq_iter=1)
    init_lora_weights = "loftq"
else:
    loftq_config = None
    init_lora_weights = True

## Apply LoRA (if use_lora is True in the config)
if config.get('use_lora', False):
    model = FastLanguageModel.get_peft_model(
        model,
        r=config['lora_r'],
        lora_alpha=config['lora_alpha'],
        target_modules=config['lora_modules'],
        modules_to_save=config.get('other_trainable', None),
        lora_dropout = 0, # Dropout = 0 is currently optimized
        bias = "none",    # Bias = "none" is currently optimized
        use_gradient_checkpointing = True,
        random_state = 3407,
        use_rslora=True,
        loftq_config=loftq_config,
        init_lora_weights=init_lora_weights,
    )

yxli2123 / LoftQ

Performance worsens versus QLoRA with TinyLlama #25