pytorch / torchtune

A Native-PyTorch Library for LLM Fine-tuning
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.06k stars 375 forks source link

[config] have a single output_dir #1265

Open felipemello1 opened 1 month ago

felipemello1 commented 1 month ago

Currently our configs have:

output_dir: /tmp/alpaca-gemma-finetune
checkpointer:
  checkpoint_dir: /tmp/gemma-2b/
  output_dir: /tmp/gemma-2b

metric_logger:
  log_dir: ${output_dir}

Instead of creating 4 paths, we should have one path, and then build the others on top of it. eg:

output_dir: /tmp/gemma-2b-lora
checkpointer:
  load_dir: ${output_dir}/checkpoint/
  save_dir: ${output_dir}/checkpoint_output/

metric_logger:
  log_dir: {output_dir}/metrics/

Also, we should probably rename checkpointer.output_dir to checkpointer.checkpoint_output_dir

RdoubleA commented 1 month ago

Can we please rename the checkpoint directories to save_dir or load_dir or something more clearer? this has always been confusing to me

SalmanMohammadi commented 1 month ago

thank you @felipemello1 : ) the "fixing small annoyances" train seems to be running and I'm all aboard.

The only thing we'd have to do here is change all the configs, right? I wonder if I could also request the minor change of parents=True when we create these directories in the checkpointer, please? That way we aren't erroring out when the original output_dir doesn't exist. When I was doing something similar for the separate checkpoint dirs for the policy and value models in PPO, I had to manually set them up before.