wangclnlp / Vision-LLM-Alignment

This repo contains the codes for supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) designed for vision LLMs.
39 stars 1 forks source link

Could you tell me the details of preference dataset. #4

Open hhhhzzzzz opened 2 weeks ago

hhhhzzzzz commented 2 weeks ago
{
    "id": "000000245946",
    "image": "000000245946.jpg",
    "conversations": [
        {
            "from": "human",
            "value": "<image>\nWhat considerations should one have when interacting with animals like giraffes in a zoo?"
        },
        {
            "from": "gpt",
            "value": [
                "When interacting with large animals like giraffes in a zoo, visitors should be aware of several considerations to ensure both their safety and the well-being of the animals. In the image, a little girl is feeding a giraffe through the fence. Some recommendations include:\n\n1. Follow the rules and guidelines provided by the zoo: Zoos often have specific instructions for interacting with animals, which are designed to ensure the safety of both visitors and animals.\n\n2. Approach the animals calmly and respectfully: Sudden movements or loud noises can startle the animals, so it's essential to approach them with caution and avoid making loud or sudden sounds.\n\n3. Feed the animals only with appropriate food: Always follow the zoo's instructions regarding the type and amount of food that can be offered to the animals. Some animals might have dietary restrictions or preferences that should be taken into account.\n\n4. Maintain a safe distance from the animals: Although the opportunity to interact closely may be exciting, it is essential to keep a safe distance from the animals, particularly if they are wild or can be unpredictable.\n\n5. Be aware of the animals' living conditions and well-being: Zoos play an important role in conservation and education, but it's crucial to recognize that the animals' welfare is of the utmost importance. Supporting zoos that prioritize animal welfare and conservation efforts can help ensure that the animals are well taken care of.\n\nBy following these recommendations, visitors can enjoy interacting with animals like giraffes in a responsible and respectful manner.",
                "When interacting with animals like giraffes in a zoo, several considerations should be taken into account to ensure the safety and well-being of both the animals and the visitors. In the image, a young girl is feeding leaves to a giraffe inside an enclosure. Here are some guidelines to follow:\n\n1. Follow the zoo's rules and guidelines for interacting with animals, as these may vary between different institutions and can change over time. Always be respectful and adhere to the staff's instructions.\n\n2. Approach the animals slowly and calmly, as sudden movements or loud noises can startle the animals, causing them stress or provoking a defensive response.\n\n3. Feed the animals only with appropriate food provided by the zoo or approved by the staff. Feeding them with unsuitable food can harm their health and disrupt their diet.\n\n4. Maintain a safe distance from the animals, especially if you are unaware of their temperament or history. This is important for personal safety as well as not causing unnecessary stress to the animal.\n\n5. Never attempt to touch or pet an animal without permission or supervision from trained zoo staff. They can assess the situation and ensure the animal's well-being.\n\nBy keeping these considerations in mind, visitors can create a positive experience for both themselves and the animals they interact with in a zoo setting."
            ]
        }
    ]
}

"value": "<image>\nWhat considerations should one have when interacting with animals like giraffes in a zoo?"

What does <image>\n mean? And before or after the question has any meaning?

    {
            "from": "gpt",
            "value": [
                "When interacting with large animals like giraffes in a zoo, visitors should be aware of several considerations to ensure both their safety and the well-being of the animals. In the image, a little girl is feeding a giraffe through the fence. Some recommendations include:\n\n1. Follow the rules and guidelines provided by the zoo: Zoos often have specific instructions for interacting with animals, which are designed to ensure the safety of both visitors and animals.\n\n2. Approach the animals calmly and respectfully: Sudden movements or loud noises can startle the animals, so it's essential to approach them with caution and avoid making loud or sudden sounds.\n\n3. Feed the animals only with appropriate food: Always follow the zoo's instructions regarding the type and amount of food that can be offered to the animals. Some animals might have dietary restrictions or preferences that should be taken into account.\n\n4. Maintain a safe distance from the animals: Although the opportunity to interact closely may be exciting, it is essential to keep a safe distance from the animals, particularly if they are wild or can be unpredictable.\n\n5. Be aware of the animals' living conditions and well-being: Zoos play an important role in conservation and education, but it's crucial to recognize that the animals' welfare is of the utmost importance. Supporting zoos that prioritize animal welfare and conservation efforts can help ensure that the animals are well taken care of.\n\nBy following these recommendations, visitors can enjoy interacting with animals like giraffes in a responsible and respectful manner.",
                "When interacting with animals like giraffes in a zoo, several considerations should be taken into account to ensure the safety and well-being of both the animals and the visitors. In the image, a young girl is feeding leaves to a giraffe inside an enclosure. Here are some guidelines to follow:\n\n1. Follow the zoo's rules and guidelines for interacting with animals, as these may vary between different institutions and can change over time. Always be respectful and adhere to the staff's instructions.\n\n2. Approach the animals slowly and calmly, as sudden movements or loud noises can startle the animals, causing them stress or provoking a defensive response.\n\n3. Feed the animals only with appropriate food provided by the zoo or approved by the staff. Feeding them with unsuitable food can harm their health and disrupt their diet.\n\n4. Maintain a safe distance from the animals, especially if you are unaware of their temperament or history. This is important for personal safety as well as not causing unnecessary stress to the animal.\n\n5. Never attempt to touch or pet an animal without permission or supervision from trained zoo staff. They can assess the situation and ensure the animal's well-being.\n\nBy keeping these considerations in mind, visitors can create a positive experience for both themselves and the animals they interact with in a zoo setting."
            ]
        }

      The first answer in the value is the chosen sample? And the second answer in the value is the rejected sample?

       Could we change "from": "gpt" to "from": "human" if the answer is generated by human?

       Thanks!
wangclnlp commented 2 weeks ago

Q: What does \n mean? And before or after the question has any meaning? A: The <image> token represents the position of the image within the input for vision models. While it can theoretically be placed at any position, it is typically located at either the beginning or the end. During data processing, the <image> token is uniformly placed at the start, and the original <image> token is removed (see code_for _removing_image_token and code_for _adding_image_token). Thus, it is not necessary to manually add this <image> token.

Q: The first answer in the value is the chosen sample? And the second answer in the value is the rejected sample? A: Yes, the first answer is preferred over the second.

Q: Could we change "from": "gpt" to "from": "human" if the answer is generated by human? A: You may proceed with this, as the value is not utilized in data processing.

Please let us know if you have any other questions.

hhhhzzzzz commented 2 weeks ago

Hi, The answer to the third question is the same as the second.

Besides, if I want to use liuhaotian/llava-v1.6-mistral-7b for training on my own data. How can I set the run_rm_training.sh?

VISION_MODEL=base_models/vision_encoder/clip-vit-large-patch14 LLM=base_models/llama-3-8b-Instruct

FROM_CHECKPOINT=models/sft_test/epoch-3

TEMPLATE=llama_3

IMAGE_FOLDER=data/coco_2017/

EPOCH=3 ZERO_STAGE=2

lr=3e-5

DATA_PATH=data/reward_samples.json EVAL_DATA_PATH=data/reward_samples_test.json CANDIDATE_NUM=2

IMAGE_FOLDER=data/coco_2017/ DATA="llava_reward"

DATA_SAMPLE="all" IMAGE_PER_SAMPLE="1"

OUTPUT=models/reward_test

Thanks!

wangclnlp commented 2 weeks ago

The prior response has been updated; please review it. Below is a training script I developed according to your specifications. However, the liuhaotian/llava-v1.6-mistral-7b has not been extensively tested. Please inform us if you encounter any issues.

#!/bin/bash
CUR_DIR=`pwd`

ROOT=${CUR_DIR}

export PYTHONPATH=${ROOT}:${PYTHONPATH}

# you can replace this path with your path to ``liuhaotian/llava-v1.6-mistral-7b``
FROM_CHECKPOINT=llava-1.5-7b-hf

MODEL_ARCHITECTURE="llava_next"

# !!! Note: We haven't tested llava-v1.6-mistral-7b yet, when training a reward model 
# so you may need to check this template to see if any code fixes are necessary. 
TEMPLATE=llava

EPOCH=1
ZERO_STAGE=3

lr=1e-6

# if you do not train a reward based on a pre-trained reward model, 
# this parameter does not need to be set
TRAINED_REWARD_MODEL=none 

OUTPUT=your_output_path

DATA_PATH=your_data_path_for_training_reward_model
EVAL_DATA_PATH=your_data_path_for_test_reward_model

IMAGE_FOLDER=your_image_folder

CANDIDATE_NUM=2

DATA="llava_reward"
DATA_SAMPLE="all"
IMAGE_PER_SAMPLE="1"

if [ "$ZERO_STAGE" == "" ]; then
    ZERO_STAGE=0
fi

mkdir -p $OUTPUT

cp $0 $OUTPUT

# we assume the batch size is 128, which means Num_GPU * per_device_train_batch_size * gradient_accumulation_steps
nohup deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port 12335 training/reward_model_training/rm_training_main.py \
    --max_seq_len 2048 --image_folder ${IMAGE_FOLDER} --template ${TEMPLATE} \
    --data_path ${DATA_PATH} --eval_data_path ${EVAL_DATA_PATH} \
    --dataset_names ${DATA} --dataset_samples ${DATA_SAMPLE} --dataset_concatenate_samples ${IMAGE_PER_SAMPLE} --max_num_image_per_sample 8 \
    --lm_reward_model_name_or_path ${LLM} \
    --vision_reward_model_name_or_path ${VISION_MODEL} \
    --gradient_checkpointing --vis_proj baseline \
    --gradient_accumulation_steps 1  --zero_stage $ZERO_STAGE --learning_rate $lr --num_warmup_steps 0.1 \
    --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 \
    --deepspeed --output_dir $OUTPUT --num_train_epochs ${EPOCH} \
    --lang_decoder_update --enable_mmca_attention --model_architecture ${MODEL_ARCHITECTURE} \
    --trained_reward_model $TRAINED_REWARD_MODEL --save_step 9900 \
    --precision bf16 --ranked_candidate_num $CANDIDATE_NUM --from_checkpoint ${FROM_CHECKPOINT} 
hhhhzzzzz commented 2 weeks ago

OK. Thanks a lot! I will try this script to develop reward models with liuhaotian/llava-v1.6-mistral-7b. I'll update the results!

wangclnlp commented 2 weeks ago

We'll be testing this model in the last few days as well. You can also keep checking back for updates.

hhhhzzzzz commented 2 weeks ago

I'm very sorry that I can't release the model since it will be trained on private data.

wangclnlp commented 2 weeks ago

OK. Please let us know if you have any questions during training with the model.

hhhhzzzzz commented 2 weeks ago

OK. Thanks a lot!

hhhhzzzzz commented 2 weeks ago

Hi,

I find some bugs in your codes.

  1. https://github.com/wangclnlp/Vision-LLM-Alignment/blob/fc022ee036c9906c430e7185d74889621bd6886d/training/utils/model/build_model.py#L3-L5

I changed it to

from .third_party_model.hf_model.modeling_llava import LlavaForConditionalGeneration
from .third_party_model.hf_model.configuration_llava import LlavaConfig
#from .third_party_model.hf_model.modeling_llava_next import LlavaNextForConditionalGeneration
from transformers import LlavaNextForConditionalGeneration, LlavaNextConfig
#from .third_party_model.hf_model.configuration_llava_next import LlavaNextConfig
  1. https://github.com/wangclnlp/Vision-LLM-Alignment/blob/fc022ee036c9906c430e7185d74889621bd6886d/training/utils/model/modeling_reward.py#L109-L141

I didn't find the parameters for args.reward_model_architecture and args.reward_base_model.

I changed it to

if args.model_architecture=="default" and is_reward:
    vis_llm, reward_image_processor, reward_tokenizer = create_dsvl_model_and_transforms(text_tokenizer=text_tokenizer,
                                                                                        ds_config=ds_config,
                                                                                        args=args)
elif is_reward:
    vis_llm, reward_image_processor, reward_tokenizer = build_model(text_tokenizer=text_tokenizer,
                                                                        ds_config=ds_config,
                                                                        model_architecture=args.model_architecture,
                                                                        from_checkpoint=args.from_checkpoint,
                                                                        args=args)
else:
    vis_llm, reward_image_processor, reward_tokenizer = build_model(text_tokenizer=text_tokenizer,
                                                                ds_config=ds_config,
                                                                args=args)

# load paramters from `from_checkpoint`
if training_reward_stage and args.model_architecture=='default':
    # we have the deepspeed chekpoint so it is a resumed job
    print(f"load checkpoint from {args.from_checkpoint}")
    vis_llm.load_state_dict(torch.load(os.path.join(args.from_checkpoint, 'pytorch_model.bin'), map_location='cpu'), strict=False)

if is_reward and (args.model_architecture=="llava" or args.model_architecture=="llava_next"):
    vis_reward_model = ViRewardModel(vis_llm=vis_llm,
                                    tokenizer=reward_tokenizer,
                                    is_reward=is_reward,
                                    vis_architecture="llava")
else:
    vis_reward_model = ViRewardModel(vis_llm=vis_llm,
                                    tokenizer=reward_tokenizer,
                                    is_reward=is_reward,
                                    vis_architecture=args.model_architecture)

return vis_reward_model, reward_image_processor, reward_tokenizer

Do you think my modifications are reasonable?

Then I can run the command:

#!/bin/bash
CUR_DIR=`pwd`

ROOT=${CUR_DIR}

export PYTHONPATH=${ROOT}:${PYTHONPATH}

VISION_MODEL=base_models/vision_encoder/clip-vit-large-patch14
LLM=base_models/mistral-7b-instruct-v0.2
# you can replace this path with your path to ``liuhaotian/llava-v1.6-mistral-7b``
FROM_CHECKPOINT=llava-hf/llava-v1.6-mistral-7b-hf

MODEL_ARCHITECTURE="llava_next"

# !!! Note: We haven't tested llava-v1.6-mistral-7b yet, when training a reward model 
# so you may need to check this template to see if any code fixes are necessary. 
TEMPLATE=llava

EPOCH=1
ZERO_STAGE=3

lr=1e-6

# if you do not train a reward based on a pre-trained reward model, 
# this parameter does not need to be set
TRAINED_REWARD_MODEL=none 

OUTPUT=output

DATA_PATH=./data/rm_dataset.json
# EVAL_DATA_PATH=none

IMAGE_FOLDER=image_dataset

CANDIDATE_NUM=2

DATA="llava_reward"
DATA_SAMPLE="all"
IMAGE_PER_SAMPLE="1"

if [ "$ZERO_STAGE" == "" ]; then
    ZERO_STAGE=0
fi

mkdir -p $OUTPUT

cp $0 $OUTPUT

# we assume the batch size is 128, which means Num_GPU * per_device_train_batch_size * gradient_accumulation_steps
deepspeed --include localhost:0,1,2,3,4,5,6,7 --master_port 12335 training/reward_model_training/rm_training_main.py \
    --max_seq_len 2048 --image_folder ${IMAGE_FOLDER} --template ${TEMPLATE} \
    --data_path ${DATA_PATH} \
    --dataset_names ${DATA} --dataset_samples ${DATA_SAMPLE} --dataset_concatenate_samples ${IMAGE_PER_SAMPLE} --max_num_image_per_sample 8 \
    --lm_reward_model_name_or_path ${LLM} \
    --vision_reward_model_name_or_path ${VISION_MODEL} \
    --gradient_checkpointing --vis_proj baseline \
    --gradient_accumulation_steps 2  --zero_stage $ZERO_STAGE --learning_rate $lr --num_warmup_steps 0.1 \
    --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 \
    --deepspeed --output_dir $OUTPUT --num_train_epochs ${EPOCH} \
    --lang_decoder_update --enable_mmca_attention --model_architecture ${MODEL_ARCHITECTURE} \
    --trained_reward_model $TRAINED_REWARD_MODEL --save_step 9900 \
    --precision bf16 --ranked_candidate_num $CANDIDATE_NUM --from_checkpoint ${FROM_CHECKPOINT} 

I don't know how VISION_MODEL=base_models/vision_encoder/clip-vit-large-patch14 LLM=base_models/mistral-7b-instruct-v0.2 work. Are are they useless parameters?

Thanks!

wangclnlp commented 2 weeks ago

We have completed the test on the liuhaotian/llava-v1.6-mistral-7b model. You need to update code (run git pull).

This is the script we used:

#!/bin/bash
CUR_DIR=`pwd`

ROOT=${CUR_DIR}

export PYTHONPATH=${ROOT}:${PYTHONPATH}

LLM=none
VISION_MODEL=none

# you can replace this path with your path to ``liuhaotian/llava-v1.6-mistral-7b``
FROM_CHECKPOINT=base_models/llava-v1.6-mistral-7b-hf

MODEL_ARCHITECTURE="llava_next"

# !!! Note: We haven't tested llava-v1.6-mistral-7b yet, when training a reward model 
# so you may need to check this template to see if any code fixes are necessary. 
TEMPLATE=llava_next

EPOCH=1
ZERO_STAGE=3

lr=1e-6

# if you do train a reward based on a pre-trained reward model, 
# this parameter does not need to be set
TRAINED_REWARD_MODEL=none 

OUTPUT=models/test

DATA_PATH=data/RLAIF-V-Dataset/rlaif_v_dataset_test.json
EVAL_DATA_PATH=data/RLAIF-V-Dataset/rlaif_v_dataset_test.json

IMAGE_FOLDER=data/RLAIF-V-Dataset/images

CANDIDATE_NUM=2

DATA="llava_reward"
DATA_SAMPLE="all"
IMAGE_PER_SAMPLE="1"

if [ "$ZERO_STAGE" == "" ]; then
    ZERO_STAGE=0
fi

mkdir -p $OUTPUT

cp $0 $OUTPUT

# we assume the batch size is 128, which means Num_GPU * per_device_train_batch_size * gradient_accumulation_steps

nohup deepspeed --include localhost:2,3,4,5,6,7 --master_port 12335 training/reward_model_training/rm_training_main.py \
    --max_seq_len 2048 --image_folder ${IMAGE_FOLDER} --template ${TEMPLATE} \
    --data_path ${DATA_PATH} --eval_data_path ${EVAL_DATA_PATH} \
    --dataset_names ${DATA} --dataset_samples ${DATA_SAMPLE} --dataset_concatenate_samples ${IMAGE_PER_SAMPLE} --max_num_image_per_sample 8 \
    --lm_reward_model_name_or_path ${LLM} \
    --vision_reward_model_name_or_path ${VISION_MODEL} \
    --gradient_checkpointing --vis_proj baseline \
    --gradient_accumulation_steps 1  --zero_stage $ZERO_STAGE --learning_rate $lr --num_warmup_steps 0.1 \
    --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 \
    --deepspeed --output_dir $OUTPUT --num_train_epochs ${EPOCH} \
    --lang_decoder_update --enable_mmca_attention --model_architecture ${MODEL_ARCHITECTURE} \
    --trained_reward_model $TRAINED_REWARD_MODEL --save_step 9900 \
    --precision bf16 --ranked_candidate_num $CANDIDATE_NUM --from_checkpoint ${FROM_CHECKPOINT} > $OUTPUT/training.log &

log:

[2024-08-30 15:25:38,682] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:39,949] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-08-30 15:25:39,949] [INFO] [runner.py:568:main] cmd = /localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=12335 --enable_each_rank_log=None training/reward_model_training/rm_training_main.py --max_seq_len 2048 --image_folder data/RLAIF-V-Dataset/images --template llava_next --data_path data/RLAIF-V-Dataset/rlaif_v_dataset_test.json --eval_data_path data/RLAIF-V-Dataset/rlaif_v_dataset_test.json --dataset_names llava_reward --dataset_samples all --dataset_concatenate_samples 1 --max_num_image_per_sample 8 --lm_reward_model_name_or_path none --vision_reward_model_name_or_path none --gradient_checkpointing --vis_proj baseline --gradient_accumulation_steps 1 --zero_stage 3 --learning_rate 1e-6 --num_warmup_steps 0.1 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --eval_step 200 --deepspeed --output_dir models/test --num_train_epochs 1 --lang_decoder_update --enable_mmca_attention --model_architecture llava_next --trained_reward_model none --save_step 9900 --precision bf16 --ranked_candidate_num 2 --from_checkpoint /localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf
[2024-08-30 15:25:42,279] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:42,789] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [2, 3, 4, 5, 6, 7]}
[2024-08-30 15:25:42,789] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=6, node_rank=0
[2024-08-30 15:25:42,790] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5]})
[2024-08-30 15:25:42,790] [INFO] [launch.py:163:main] dist_world_size=6
[2024-08-30 15:25:42,790] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=2,3,4,5,6,7
[2024-08-30 15:25:42,790] [INFO] [launch.py:253:main] process 1874730 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=0', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,791] [INFO] [launch.py:253:main] process 1874731 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=1', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,792] [INFO] [launch.py:253:main] process 1874732 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=2', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,792] [INFO] [launch.py:253:main] process 1874733 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=3', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,793] [INFO] [launch.py:253:main] process 1874734 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=4', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:42,793] [INFO] [launch.py:253:main] process 1874735 spawned with command: ['/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/bin/python', '-u', 'training/reward_model_training/rm_training_main.py', '--local_rank=5', '--max_seq_len', '2048', '--image_folder', 'data/RLAIF-V-Dataset/images', '--template', 'llava_next', '--data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--eval_data_path', 'data/RLAIF-V-Dataset/rlaif_v_dataset_test.json', '--dataset_names', 'llava_reward', '--dataset_samples', 'all', '--dataset_concatenate_samples', '1', '--max_num_image_per_sample', '8', '--lm_reward_model_name_or_path', 'none', '--vision_reward_model_name_or_path', 'none', '--gradient_checkpointing', '--vis_proj', 'baseline', '--gradient_accumulation_steps', '1', '--zero_stage', '3', '--learning_rate', '1e-6', '--num_warmup_steps', '0.1', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--eval_step', '200', '--deepspeed', '--output_dir', 'models/test', '--num_train_epochs', '1', '--lang_decoder_update', '--enable_mmca_attention', '--model_architecture', 'llava_next', '--trained_reward_model', 'none', '--save_step', '9900', '--precision', 'bf16', '--ranked_candidate_num', '2', '--from_checkpoint', '/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf']
[2024-08-30 15:25:45,490] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,492] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,506] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,531] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,597] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-08-30 15:25:45,624] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-08-30 15:25:46,077] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,077] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-08-30 15:25:46,265] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,291] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,297] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,307] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-08-30 15:25:46,328] [INFO] [comm.py:637:init_distributed] cdb=None

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:06<00:19,  6.59s/it]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:06<00:20,  6.77s/it]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:07<00:21,  7.12s/it]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:07<00:21,  7.21s/it]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:07<00:21,  7.24s/it]
Loading checkpoint shards:  25%|██▌       | 1/4 [00:07<00:22,  7.47s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:08<00:07,  3.77s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:08<00:07,  3.90s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:10<00:09,  4.83s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:10<00:09,  4.84s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:10<00:09,  4.86s/it]
Loading checkpoint shards:  50%|█████     | 2/4 [00:10<00:09,  4.93s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:10<00:03,  3.20s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:11<00:03,  3.23s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.03s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.80s/it]

Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.04s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.81s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
ViRewardModel(
  (v_head): Linear(in_features=4096, out_features=1, bias=False)
  (rwtranrsformer): LlavaNextForConditionalGeneration(
    (vision_tower): CLIPVisionModel(
      (vision_model): CLIPVisionTransformer(
        (embeddings): CLIPVisionEmbeddings(
          (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14), bias=False)
          (position_embedding): Embedding(577, 1024)
        )
        (pre_layrnorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder): CLIPEncoder(
          (layers): ModuleList(
            (0-23): 24 x CLIPEncoderLayer(
              (self_attn): CLIPAttention(
                (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
                (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
                (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
                (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
              )
              (layer_norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
              (mlp): CLIPMLP(
                (activation_fn): QuickGELUActivation()
                (fc1): Linear(in_features=1024, out_features=4096, bias=True)
                (fc2): Linear(in_features=4096, out_features=1024, bias=True)
              )
              (layer_norm2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            )
          )
        )
        (post_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (multi_modal_projector): LlavaNextMultiModalProjector(
      (linear_1): Linear(in_features=1024, out_features=4096, bias=True)
      (act): GELUActivation()
      (linear_2): Linear(in_features=4096, out_features=4096, bias=True)
    )
    (language_model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32064, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralSdpaAttention(
              (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
              (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
              (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
              (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
              (rotary_emb): MistralRotaryEmbedding()
            )
            (mlp): MistralMLP(
              (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
              (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
              (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
              (act_fn): SiLU()
            )
            (input_layernorm): MistralRMSNorm()
            (post_attention_layernorm): MistralRMSNorm()
          )
        )
        (norm): MistralRMSNorm()
      )
      (lm_head): Linear(in_features=4096, out_features=32064, bias=False)
    )
  )
)
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
[DATA] Built dataset llava_reward with all 1000 samples.
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
[DATA] Built dataset llava_reward with all 1000 samples.
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[2024-08-30 15:26:00,304] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown
[2024-08-30 15:26:00,304] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(

Loading checkpoint shards:  75%|███████▌  | 3/4 [00:13<00:03,  3.91s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:13<00:03,  3.91s/it]
Loading checkpoint shards:  75%|███████▌  | 3/4 [00:13<00:03,  3.94s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  2.44s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  3.35s/it]

Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  2.46s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  3.35s/it]

Loading checkpoint shards:  75%|███████▌  | 3/4 [00:13<00:03,  3.96s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  2.45s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  3.36s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  2.46s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00,  3.40s/it]
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
check tokenizer LlamaTokenizerFast(name_or_path='/localnvme/application/sc_new/wangchenglong_56/base_models/llava-v1.6-mistral-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
    0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
    32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
/localnvme/application/sc_new/miniconda3/envs/wcl_rlhf_new/lib/python3.10/site-packages/transformers/optimization.py:591: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
[2024-08-30 15:26:35,358] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-08-30 15:26:35,360] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-08-30 15:26:35,360] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-08-30 15:26:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2024-08-30 15:26:35,377] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'transformers.optimization.AdamW'>
[2024-08-30 15:26:35,377] [WARNING] [engine.py:1188:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-08-30 15:26:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
[2024-08-30 15:26:35,377] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
[2024-08-30 15:26:35,477] [INFO] [utils.py:800:see_memory_usage] Stage 3 initialize beginning
[2024-08-30 15:26:35,478] [INFO] [utils.py:801:see_memory_usage] MA 14.09 GB         Max_MA 14.09 GB         CA 14.24 GB         Max_CA 14 GB 
[2024-08-30 15:26:35,478] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.12 GB, percent = 7.9%
[2024-08-30 15:26:35,480] [INFO] [stage3.py:130:__init__] Reduce bucket size 500,000,000
[2024-08-30 15:26:35,480] [INFO] [stage3.py:131:__init__] Prefetch bucket size 0
[2024-08-30 15:26:35,550] [INFO] [utils.py:800:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2024-08-30 15:26:35,551] [INFO] [utils.py:801:see_memory_usage] MA 14.09 GB         Max_MA 14.09 GB         CA 14.24 GB         Max_CA 14 GB 
[2024-08-30 15:26:35,551] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.13 GB, percent = 7.9%
Parameter Offload: Total persistent parameters: 607232 in 314 params
[2024-08-30 15:26:36,311] [INFO] [utils.py:800:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2024-08-30 15:26:36,312] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB         Max_MA 14.09 GB         CA 16.66 GB         Max_CA 17 GB 
[2024-08-30 15:26:36,312] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.35 GB, percent = 7.9%
[2024-08-30 15:26:37,650] [INFO] [utils.py:800:see_memory_usage] Before creating fp16 partitions
[2024-08-30 15:26:37,651] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB         Max_MA 2.35 GB         CA 16.66 GB         Max_CA 17 GB 
[2024-08-30 15:26:37,651] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.59 GB, percent = 7.9%
[2024-08-30 15:26:40,150] [INFO] [utils.py:800:see_memory_usage] After creating fp16 partitions: 4
[2024-08-30 15:26:40,151] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB         Max_MA 2.35 GB         CA 3.26 GB         Max_CA 17 GB 
[2024-08-30 15:26:40,151] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 83.13 GB, percent = 8.3%
[2024-08-30 15:26:40,228] [INFO] [utils.py:800:see_memory_usage] Before creating fp32 partitions
[2024-08-30 15:26:40,228] [INFO] [utils.py:801:see_memory_usage] MA 2.35 GB         Max_MA 2.35 GB         CA 3.26 GB         Max_CA 3 GB 
[2024-08-30 15:26:40,228] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 81.76 GB, percent = 8.1%
[2024-08-30 15:26:40,329] [INFO] [utils.py:800:see_memory_usage] After creating fp32 partitions
[2024-08-30 15:26:40,329] [INFO] [utils.py:801:see_memory_usage] MA 6.86 GB         Max_MA 7.95 GB         CA 8.86 GB         Max_CA 9 GB 
[2024-08-30 15:26:40,329] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.8 GB, percent = 7.9%
[2024-08-30 15:26:40,439] [INFO] [utils.py:800:see_memory_usage] Before initializing optimizer states
[2024-08-30 15:26:40,440] [INFO] [utils.py:801:see_memory_usage] MA 6.86 GB         Max_MA 6.86 GB         CA 8.86 GB         Max_CA 9 GB 
[2024-08-30 15:26:40,440] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.57 GB, percent = 7.9%
[2024-08-30 15:26:40,520] [INFO] [utils.py:800:see_memory_usage] After initializing optimizer states
[2024-08-30 15:26:40,520] [INFO] [utils.py:801:see_memory_usage] MA 6.86 GB         Max_MA 10.59 GB         CA 12.59 GB         Max_CA 13 GB 
[2024-08-30 15:26:40,521] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.57 GB, percent = 7.9%
[2024-08-30 15:26:40,521] [INFO] [stage3.py:486:_setup_for_real_optimizer] optimizer state initialized

  0%|          | 0/21 [00:00<?, ?it/s]
  0%|          | 0/21 [00:00<?, ?it/s]
  0%|          | 0/21 [00:00<?, ?it/s]
  0%|          | 0/21 [00:00<?, ?it/s]
  0%|          | 0/21 [00:00<?, ?it/s][2024-08-30 15:26:42,158] [INFO] [utils.py:800:see_memory_usage] After initializing ZeRO optimizer
[2024-08-30 15:26:42,158] [INFO] [utils.py:801:see_memory_usage] MA 10.05 GB         Max_MA 10.54 GB         CA 15.21 GB         Max_CA 15 GB 
[2024-08-30 15:26:42,159] [INFO] [utils.py:808:see_memory_usage] CPU Virtual Memory:  used = 79.62 GB, percent = 7.9%
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x1553fca66c80>
[2024-08-30 15:26:42,159] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)]
[2024-08-30 15:26:42,160] [INFO] [config.py:996:print] DeepSpeedEngine configuration:
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print]   amp_enabled .................. False
[2024-08-30 15:26:42,160] [INFO] [config.py:1000:print]   amp_params ................... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   bfloat16_enabled ............. True
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   bfloat16_immediate_grad_update  False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   checkpoint_parallel_write_pipeline  False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   checkpoint_tag_validation_enabled  True
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   checkpoint_tag_validation_fail  False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x1553fca67280>
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   communication_data_type ...... None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   compile_config ............... enabled=False backend='inductor' kwargs={}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   curriculum_enabled_legacy .... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   curriculum_params_legacy ..... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   data_efficiency_enabled ...... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   dataloader_drop_last ......... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   disable_allgather ............ False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   dump_state ................... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   dynamic_loss_scale_args ...... None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_enabled ........... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_gas_boundary_resolution  1
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_layer_num ......... 0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_max_iter .......... 100
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_stability ......... 1e-06
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_tol ............... 0.01
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   eigenvalue_verbose ........... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   elasticity_enabled ........... False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   fp16_auto_cast ............... None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   fp16_enabled ................. False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   fp16_master_weights_and_gradients  False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   global_rank .................. 0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   grad_accum_dtype ............. None
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   gradient_accumulation_steps .. 1
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   gradient_clipping ............ 1.0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   gradient_predivide_factor .... 1.0
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   graph_harvesting ............. False
[2024-08-30 15:26:42,161] [INFO] [config.py:1000:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   initial_dynamic_scale ........ 1
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   load_universal_checkpoint .... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   loss_scale ................... 1.0
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   memory_breakdown ............. False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   mics_hierarchial_params_gather  False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   mics_shard_size .............. -1
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   optimizer_legacy_fusion ...... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   optimizer_name ............... None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   optimizer_params ............. None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   pld_enabled .................. False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   pld_params ................... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   prescale_gradients ........... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   scheduler_name ............... None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   scheduler_params ............. None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   seq_parallel_communication_data_type  torch.float32
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   sparse_attention ............. None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   sparse_gradients_enabled ..... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   steps_per_print .............. 10
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   train_batch_size ............. 48
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   train_micro_batch_size_per_gpu  8
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   use_data_before_expert_parallel_  False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   use_node_local_storage ....... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   wall_clock_breakdown ......... False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   weight_quantization_config ... None
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   world_size ................... 6
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   zero_allow_untested_optimizer  True
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=0 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   zero_enabled ................. True
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   zero_force_ds_cpu_optimizer .. False
[2024-08-30 15:26:42,162] [INFO] [config.py:1000:print]   zero_optimization_stage ...... 3
[2024-08-30 15:26:42,162] [INFO] [config.py:986:print_user_config]   json = {
    "train_batch_size": 48, 
    "train_micro_batch_size_per_gpu": 8, 
    "steps_per_print": 10, 
    "zero_optimization": {
        "stage": 3, 
        "offload_param": {
            "device": "none"
        }, 
        "offload_optimizer": {
            "device": "none"
        }, 
        "stage3_param_persistence_threshold": 1.000000e+04, 
        "stage3_max_live_parameters": 3.000000e+07, 
        "stage3_prefetch_bucket_size": 0, 
        "memory_efficient_linear": false
    }, 
    "zero_allow_untested_optimizer": true, 
    "zero_force_ds_cpu_optimizer": false, 
    "fp16": {
        "enabled": false, 
        "loss_scale_window": 100
    }, 
    "bf16": {
        "enabled": true
    }, 
    "gradient_clipping": 1.0, 
    "prescale_gradients": false, 
    "wall_clock_breakdown": false, 
    "hybrid_engine": {
        "enabled": false, 
        "max_out_tokens": 512, 
        "inference_tp_size": 1, 
        "release_inference_cache": false, 
        "pin_parameters": true, 
        "tp_gather_partition_size": 8
    }
}
***** Before training *****
***** Evaluation Begin *****

  0%|          | 0/21 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)

  5%|▍         | 1/21 [00:09<03:14,  9.74s/it]
  5%|▍         | 1/21 [00:09<03:16,  9.83s/it]
  5%|▍         | 1/21 [00:09<03:16,  9.83s/it]
  5%|▍         | 1/21 [00:09<03:16,  9.83s/it]
  5%|▍         | 1/21 [00:09<03:16,  9.83s/it]
  5%|▍         | 1/21 [00:09<03:16,  9.84s/it]
 10%|▉         | 2/21 [00:18<02:54,  9.17s/it]
 10%|▉         | 2/21 [00:18<02:54,  9.21s/it]
 10%|▉         | 2/21 [00:18<02:54,  9.21s/it]
 10%|▉         | 2/21 [00:18<02:54,  9.21s/it]
 10%|▉         | 2/21 [00:18<02:54,  9.21s/it]
 10%|▉         | 2/21 [00:18<02:54,  9.21s/it]
 14%|█▍        | 3/21 [00:26<02:37,  8.74s/it]
 14%|█▍        | 3/21 [00:26<02:36,  8.72s/it]
 14%|█▍        | 3/21 [00:26<02:37,  8.75s/it]
 14%|█▍        | 3/21 [00:26<02:37,  8.76s/it]
 14%|█▍        | 3/21 [00:26<02:37,  8.77s/it]
 14%|█▍        | 3/21 [00:27<02:40,  8.94s/it]
 19%|█▉        | 4/21 [00:35<02:26,  8.64s/it]
 19%|█▉        | 4/21 [00:35<02:26,  8.64s/it]
 19%|█▉        | 4/21 [00:35<02:27,  8.66s/it]
 19%|█▉        | 4/21 [00:35<02:27,  8.67s/it]
 19%|█▉        | 4/21 [00:35<02:26,  8.63s/it]
 19%|█▉        | 4/21 [00:35<02:27,  8.68s/it]
 24%|██▍       | 5/21 [00:42<02:12,  8.27s/it]
 24%|██▍       | 5/21 [00:42<02:12,  8.29s/it]
 24%|██▍       | 5/21 [00:42<02:12,  8.28s/it]
 24%|██▍       | 5/21 [00:42<02:12,  8.30s/it]
 24%|██▍       | 5/21 [00:43<02:12,  8.28s/it]
 24%|██▍       | 5/21 [00:43<02:15,  8.48s/it]
 29%|██▊       | 6/21 [00:51<02:06,  8.42s/it]
 29%|██▊       | 6/21 [00:51<02:06,  8.43s/it]
 29%|██▊       | 6/21 [00:51<02:05,  8.40s/it]
 29%|██▊       | 6/21 [00:51<02:06,  8.45s/it]
 29%|██▊       | 6/21 [00:51<02:07,  8.48s/it]

Please let us know if you have other questions.

wangclnlp commented 2 weeks ago

However, we do not check this template. You need to modify it according to your settings in the file 'DST.py'.

hhhhzzzzz commented 2 weeks ago

I can run my experiment, let's see the results! Thanks!

hhhhzzzzz commented 4 days ago

I get around 80% accuracy in my internal data. I have a question, could you use the Bradley-Terry model for the loss function?

wangclnlp commented 4 days ago

When you use this comparison pair, the Plackett-Luce model is equivalent to the Bradley-Terry model. You can refer to this derivation process in the background section of this paper.