Closed KuuCi closed 2 months ago
This seems like a breaking change, do we have a deprecation plan for existing mcli yamls? I think a lot of people call composer scripts/train/train.py right now
We aren't deleting scripts/train/train.py
, scripts/train/train.py
is just calling train/train.py
now. Here is a run showing that the existing workflow still works:
test-cli-ZzkqPt runs:
composer train/train.py /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
Ah, thanks for pointing that out. I'll give a more detailed review later
will update to match scripts/train/train.py merges after first pass
manual test runs updated
This PR allows users to call
composer llm-foundry train {YAML_PATH} {ARGS}
while maintaining correctness withcomposer llm-foundry/train.py {PATH} {ARGS}
. The motivation is for DLE where we want to make the CLI much more intuitive in the docker imagesTesting: test-cli-cSn2Rb runs:
composer -c -n 8 llmfoundry train /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
test-cli-qsRHEI runs:
composer -c llmfoundry train /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
test-cli-vGpXcw runs:
composer train/train.py /mnt/config/parameters.yaml || (echo "Command failed - killing python" && pkill python && exit 1)
Here is the MLflow experiement folder indicating all three runs act the same: https://dbc-04ac0685-8857.staging.cloud.databricks.com/ml/experiments/3707544126254710?o=3360802220363900&searchFilter=&orderByKey=attributes.start_time&orderByAsc=false&startTime=ALL&lifecycleFilter=Active&modelVersionFilter=All+Runs&datasetsFilter=W10%3D