Prts config examples - Githubissues

Thank you for your interests!

An example of G_learn operator config:

{
    "src_config_name": "6L2048H",
    "trg_config_name": "24L2048H"
}

An example script:

srun python pretrain/run_pretrain.py \
    --num_nodes=2 \
    --model_name=24L2048H \
    --name=24L2048H \
    --method=ligo \
    --config_path=/path/to/your/config/ligo_6L_24L.json \
    --out_dir=/path/to/your/out_dir \
    --train_data_dir=/path/to/your/training_data \
    --src_init_path=/path/to/your/source_model.pth \
    --devices=8 \
    --global_batch_size=1024 \
    --learning_rate=7e-5 \
    --min_lr=7e-6 \
    --micro_batch_size=8 \
    --max_step=100 \
    --warmup_steps=50 \
    --log_step_interval=1 \
    --eval_iters=10000 \
    --save_step_interval=100 \
    --eval_step_interval=100000 \
    --weight_decay=1e-1 \
    --beta1=0.9 \
    --beta2=0.95 \
    --grad_clip=1.0 \
    --decay_lr=True

G_learn will train a hyper net to transfer the source model to the target model. Next, you must continually pretrain your model after G_learn:

srun python pretrain/run_pretrain.py \
    --num_nodes=16 \
    --model_name=24L2048H \
    --name=24L2048H \
    --method=scratch \
    --out_dir=/path/to/your/out_dir \
    --train_data_dir=/path/to/your/training_data \
    --src_init_path=/path/to/your/model_after_G_learn.pth \
    --devices=8 \
    --global_batch_size=1024 \
    --learning_rate=3e-4 \
    --min_lr=3e-5 \
    --micro_batch_size=8 \
    --max_step=300000 \
    --warmup_steps=3000 \
    --log_step_interval=1 \
    --eval_iters=10000 \
    --save_step_interval=5000 \
    --eval_step_interval=5000 \
    --weight_decay=1e-1 \
    --beta1=0.9 \
    --beta2=0.95 \
    --grad_clip=1.0 \
    --decay_lr=True

Moreover, an example config of G_zero in widthwise:

{
    "src_config_name": "24L1024H",
    "trg_config_name": "24L2048H",
    "src_path": "/path/to/your/source_model.pth"
}

The running following script to start your pretraining:

srun python pretrain/run_pretrain.py \
    --num_nodes=4 \
    --model_name=24L2048H \
    --name=24L2048H \
    --method=zero \
    --config_path=/path/to/your/config/zero_1024H_2048H.json \
    --out_dir=/path/to/your/out_dir \
    --train_data_dir=/path/to/your/training_data \
    --src_init_path=/path/to/your/source_model.pth \
    --devices=8 \
    --global_batch_size=1024 \
    --learning_rate=3e-4 \
    --min_lr=3e-5 \
    --micro_batch_size=8 \
    --max_step=300000 \
    --warmup_steps=3000 \
    --log_step_interval=1 \
    --eval_iters=10000 \
    --save_step_interval=5000 \
    --eval_step_interval=5000 \
    --weight_decay=1e-1 \
    --beta1=0.9 \
    --beta2=0.95 \
    --grad_clip=1.0 \
    --decay_lr=True \

We apologize for not integrating depthwise G_zero yet. Currently, depthwise G_zero is still a standalone Python script. You can run G_zero_depthwise.py with lemon=True to apply depthwise G_zero to a Hugging Face model. After obtaining the weights, you can use the script scripts/convert_hf_checkpoint.py to convert them into the lit-gpt format for continual pretraining.

For G_random, an example config is:

{
    "src_config_name": "6L2048H",
    "trg_config_name": "24L2048H",
    "src_path": "/path/to/your/source_model.pth",
    "grow_step": 5000
}

You can run your pretraining by following script:

srun python pretrain/run_pretrain.py \
    --num_nodes=16 \
    --model_name=24L2048H \
    --name=24L2048H \
    --method=msg \
    --config_path=/path/to/your/config/msg_6L_24L.json \
    --out_dir=/path/to/your/out_dir \
    --train_data_dir=/path/to/your/training_data \
    --devices=8 \
    --global_batch_size=1024 \
    --learning_rate=3e-4 \
    --min_lr=3e-5 \
    --micro_batch_size=8 \
    --max_step=300000 \
    --warmup_steps=3000 \
    --log_step_interval=1 \
    --eval_iters=10000 \
    --save_step_interval=5000 \
    --eval_step_interval=5000 \
    --weight_decay=1e-1 \
    --beta1=0.9 \
    --beta2=0.95 \
    --grad_clip=1.0 \
    --decay_lr=True

tongxuluo / prts

Prts config examples #2