[Feature] Add loading different datasets based on training stages

Reproduce

Step 1: Modify your config:

Use a single dataset for the entire training

data:
  dataset:
      dataset_overwrite_cache: false
      dataset_processing_num_proc_per_process: 1
      hf_dataset_config_name: null
      hf_dataset_or_datasets: HuggingFaceH4/testing_alpaca_small
      hf_dataset_splits: train
      text_column_name: completion

Use different datasets based on training stages

  # NOTE: if you wanna use different datasets for different stages of the training
data_stages:
  - name: Stable Training Stage
    start_training_step: 1
    data:
      dataset:
        dataset_overwrite_cache: false
        dataset_processing_num_proc_per_process: 1
        hf_dataset_config_name: null
        hf_dataset_or_datasets: HuggingFaceH4/testing_alpaca_small
        hf_dataset_splits: train
        text_column_name: completion
      num_loading_workers: 1
      seed: 42
  - name: Annealing Phase
    start_training_step: 10
    data:
      dataset:
        dataset_overwrite_cache: false
        dataset_processing_num_proc_per_process: 1
        hf_dataset_config_name: null
        hf_dataset_or_datasets: HuggingFaceH4/testing_alpaca_small
        hf_dataset_splits: train
        text_column_name: completion
      num_loading_workers: 1
      seed: 42

Step 2: CUDA_DEVICE_MAX_CONNECTIONS=1 torchrun --nproc_per_node=8 run_train.py --config-file examples/config_tiny_llama.yaml

huggingface / nanotron

[Feature] Add loading different datasets based on training stages #113