What is the best way for the inference process in LORA in PEFT approach

Here is the SFTtrainer method i used for finetuning mistral

trainer = SFTTrainer(
    model=peft_model,
    train_dataset=data,
    peft_config=peft_config,
    dataset_text_field=" column name",
    max_seq_length=3000,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)
trainer.train()

I found different mechanisms for the finetuned model inference after PEFT based LORA finetuning

Method - 1

save adapter after completing training and then merge with base model then use for inference

trainer.model.save_pretrained("new_adapter_path")
from peft import PeftModel
finetuned_model = PeftModel.from_pretrained(base_model,
                                  new_adapter_path,
                                  torch_dtype=torch.float16,
                                  is_trainable=False,
                                  device_map="auto"
                                  )
finetuned_model = finetuned_model.merge_and_unload()

Method - 2

save checkpoints during training and then use the checkpoint with the least loss

from peft import PeftModel
finetuned_model = PeftModel.from_pretrained(base_model,
                                  "least loss checkpoint path",
                                  torch_dtype=torch.float16,
                                  is_trainable=False,
                                  device_map="auto"
                                  )
finetuned_model = finetuned_model.merge_and_unload()

Method - 3

same method with AutoPeftModelForCausalLM class

model = AutoPeftModelForCausalLM.from_pretrained(
    "output directory checkpoint path",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="cuda")
finetuned_model = finetuned_model.merge_and_unload()

Method-4

AutoPeftModelForCausalLM class specifies the output folder without specifying a specific checkpoint

instruction_tuned_model = AutoPeftModelForCausalLM.from_pretrained(
    training_args.output_dir,
    torch_dtype=torch.bfloat16,
    device_map = 'auto',
    trust_remote_code=True,
)
finetuned_model = finetuned_model.merge_and_unload()

Method-5 All the above methods without merging

#finetuned_model = finetuned_model.merge_and_unload()

Which is the actual method I should follow for inference? and when to use which method over another?

mistralai / mistral-inference

What is the best way for the inference process in LORA in PEFT approach #103