Fine-tune argument resume_from_checkpoint starts from scratch instead of from checkpoint

In finetune.py there is the following section to support resuming from a checkpoint, but you may note that resume_from_checkpoint is set to false if pytorch_model.bin does not exist even though it also seems to support a checkpoint_name of adapter_model.bin. This will cause the finetune.py to start from scratch even if a seemingly valid resume_from_checkpoint argument is supplied.

If I move line 200 into the if/else on line 204 and line 208 it will resume my fine-tune from my adapter_model.bin as expected. Is there something I'm missing here? Should I not be resuming from certain checkpoints?

if resume_from_checkpoint:
    # Check the available weights and load them
    checkpoint_name = os.path.join(
        resume_from_checkpoint, "pytorch_model.bin"
    )  # Full checkpoint
    if not os.path.exists(checkpoint_name):
        checkpoint_name = os.path.join(
            resume_from_checkpoint, "adapter_model.bin"
        )  # only LoRA model - LoRA config above has to fit
        resume_from_checkpoint = (
            False  # So the trainer won't try loading its state
        )
    # The two files above have a different name depending on how they were saved, but are actually the same.
    if os.path.exists(checkpoint_name):
        print(f"Restarting from {checkpoint_name}")
        adapters_weights = torch.load(checkpoint_name)
        set_peft_model_state_dict(model, adapters_weights)
    else:
        print(f"Checkpoint {checkpoint_name} not found")

if resume_from_checkpoint:
    # Check the available weights and load them
    checkpoint_name = os.path.join(
        resume_from_checkpoint, "pytorch_model.bin"
    )  # Full checkpoint
    if not os.path.exists(checkpoint_name):
        checkpoint_name = os.path.join(
            resume_from_checkpoint, "adapter_model.bin"
        )  # only LoRA model - LoRA config above has to fit
    # The two files above have a different name depending on how they were saved, but are actually the same.
    if os.path.exists(checkpoint_name):
        print(f"Restarting from {checkpoint_name}")
        adapters_weights = torch.load(checkpoint_name)
        set_peft_model_state_dict(model, adapters_weights)
        resume_from_checkpoint = (
            True
        )
    else:
        print(f"Checkpoint {checkpoint_name} not found")
        resume_from_checkpoint = (
            False  # So the trainer won't try loading its state
        )

tloen / alpaca-lora

Fine-tune argument resume_from_checkpoint starts from scratch instead of from checkpoint #585