Closed xrsrke closed 7 months ago
Reproduce
Use a single dataset for the entire training
data: dataset: dataset_overwrite_cache: false dataset_processing_num_proc_per_process: 1 hf_dataset_config_name: null hf_dataset_or_datasets: HuggingFaceH4/testing_alpaca_small hf_dataset_splits: train text_column_name: completion
Use different datasets based on training stages
# NOTE: if you wanna use different datasets for different stages of the training data_stages: - name: Stable Training Stage start_training_step: 1 data: dataset: dataset_overwrite_cache: false dataset_processing_num_proc_per_process: 1 hf_dataset_config_name: null hf_dataset_or_datasets: HuggingFaceH4/testing_alpaca_small hf_dataset_splits: train text_column_name: completion num_loading_workers: 1 seed: 42 - name: Annealing Phase start_training_step: 10 data: dataset: dataset_overwrite_cache: false dataset_processing_num_proc_per_process: 1 hf_dataset_config_name: null hf_dataset_or_datasets: HuggingFaceH4/testing_alpaca_small hf_dataset_splits: train text_column_name: completion num_loading_workers: 1 seed: 42
CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml
Reproduce
Use a single dataset for the entire training
Use different datasets based on training stages
CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml