Closed TJ-Solergibert closed 1 month ago
model_weights_only
will only be effect for the very last checkpoint, after all training is done. We could not do model_weights_only
for all other checkpoints as those checkpoints can also be used for fault tolerance.
Hi,
With the llama3-8B config either setting in the .toml file
model_weights_only = true
//false
or via the--checkpoint.model_weights_only
flag produces exactly the same checkpoints, same size even across multiple runs.Also HF Llama3-8B checkpoints are ~16 GB compared to the 90GB it's producing. Running with DP = 4, PP = 1 & TP = 1.