Closed staghado closed 4 months ago
The confusion stems from the README which specifies a .sh
file to the run_train.py
which is obviously wrong.
from the README:
torchrun --nproc_per_node=8 run_train.py --config-file examples/train_tiny_llama.sh
should be :
torchrun --nproc_per_node=8 run_train.py --config-file examples/train_tiny_llama.yaml
Building from main with
pip install -e ".[dev]"
and trying to run the tiny llama training example fails.repro code :
output:
same for mamba