huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.44k stars 385 forks source link

Memory Issue with 7b Model Fine-Tuning on 6 H100 GPUs #16

Open apt-team-018 opened 10 months ago

apt-team-018 commented 10 months ago

Hello everyone, I'm encountering a memory issue while fine-tuning a 7b model (such as Mistral) using a repository I found. Despite having 6 H100 GPUs at my disposal, I run into out-of-memory errors when using a batch size of 4. Interestingly, when I use libraries like Axolotl for similar tasks, I don't face this problem. Could anyone provide insights or suggestions on how to resolve these memory issues with the specific repository I'm using for fine-tuning? Any help would be greatly appreciated!

edbeeching commented 10 months ago

Hi, is the repo you are referring to this one or another one? Since your question was not clear about this.

apt-team-018 commented 9 months ago

Yes, the repository I'm referring to is 'huggingface/alignment-handbook.' Despite having 6 H100 GPUs for a 7b parameter model, I'm encountering out-of-memory issues. I've set per_device_train_batch_size = 1, but the final batch size somehow ends up being 24, which is likely causing the memory overflow. This issue is preventing me from fine-tuning a 34 billion parameter model on this setup.

Additionally, I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on the second attempt, where I started from 2 epochs, it oddly skipped some epochs, jumping from 0.15 directly to 1, and then stopped at 2.25. For more detailed information, you can check this WandB link - https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd/

Configs -

Model arguments

model_name_or_path: 01-ai/Yi-6B model_revision: main torch_dtype: bfloat16 use_flash_attention_2: false trust_remote_code: true

Data training arguments

dataset_mixer: communityai/apt-chat-micro-dataset-llm-v2-714k: 0.4 dataset_splits:

SFT trainer config

bf16: true do_eval: true evaluation_strategy: epoch gradient_accumulation_steps: 4 gradient_checkpointing: false hub_model_id: apt-chat-yi-6B-sft-full hub_strategy: every_save learning_rate: 0.00002 log_level: info logging_steps: 50 logging_strategy: steps lr_scheduler_type: cosine max_seq_length: 4096 max_steps: -1 num_train_epochs: 2 output_dir: data/apt-chat-yi-6B-sft-full overwrite_output_dir: true per_device_eval_batch_size: 1 per_device_train_batch_size: 1 push_to_hub: true remove_unused_columns: true report_to:


LOGS -

INFO:root:Using nproc_per_node=8. [2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] [2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] [2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. [2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] [2023-11-14 02:09:45,328] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-11-14 02:09:45,584] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( [2023-11-14 02:09:45,607] [INFO] [comm.py:637:init_distributed] cdb=None 2023-11-14 02:09:45 - WARNING - main - Process rank: 7, device: cuda:7, n_gpu: 1 distributed training: True, 16-bits training: False [2023-11-14 02:09:45,646] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-11-14 02:09:45,793] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-11-14 02:09:45,832] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-11-14 02:09:45,834] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2023-11-14 02:09:45,835] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( [2023-11-14 02:09:45,864] [INFO] [comm.py:637:init_distributed] cdb=None [2023-11-14 02:09:45,908] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( 2023-11-14 02:09:45 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1 distributed training: True, 16-bits training: False [2023-11-14 02:09:45,939] [INFO] [comm.py:637:init_distributed] cdb=None [2023-11-14 02:09:45,939] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 2023-11-14 02:09:45 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False 2023-11-14 02:09:45 - INFO - main - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='01-ai/Yi-6B', model_revision='main', model_code_revision=None, torch_dtype='bfloat16', trust_remote_code=True, use_flash_attention_2=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False) 2023-11-14 02:09:45 - INFO - main - Data parameters DataArguments(chat_template=None, dataset_mixer={'communityai/apt-chat-micro-dataset-llm-v2-714k': 0.4}, dataset_splits=['train', 'test'], max_train_samples=None, max_eval_samples=None, preprocessing_num_workers=12, truncation_side=None) 2023-11-14 02:09:45 - INFO - main - Training/evaluation parameters SFTConfig( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=apt-chat-yi-6B-sft-full, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=data/apt-chat-yi-6B-sft-full/runs/Nov14_02-09-45_6191edb408fa, logging_first_step=True, logging_nan_inf_filter=True, logging_steps=50, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_seq_length=4096, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=2, optim=adamw_torch, optim_args=None, output_dir=data/apt-chat-yi-6B-sft-full, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=data/apt-chat-yi-6B-sft-full, save_on_each_node=False, save_safetensors=True, save_steps=500, save_strategy=no, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=False, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( [2023-11-14 02:09:46,074] [INFO] [comm.py:637:init_distributed] cdb=None /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( 2023-11-14 02:09:46 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1 distributed training: True, 16-bits training: False [2023-11-14 02:09:46,109] [INFO] [comm.py:637:init_distributed] cdb=None [2023-11-14 02:09:46,110] [INFO] [comm.py:637:init_distributed] cdb=None [2023-11-14 02:09:46,118] [INFO] [comm.py:637:init_distributed] cdb=None 2023-11-14 02:09:46 - WARNING - main - Process rank: 6, device: cuda:6, n_gpu: 1 distributed training: True, 16-bits training: False 2023-11-14 02:09:46 - WARNING - main - Process rank: 4, device: cuda:4, n_gpu: 1 distributed training: True, 16-bits training: False 2023-11-14 02:09:46 - WARNING - main - Process rank: 3, device: cuda:3, n_gpu: 1 distributed training: True, 16-bits training: False /usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( [2023-11-14 02:09:46,193] [INFO] [comm.py:637:init_distributed] cdb=None 2023-11-14 02:09:46 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, 16-bits training: False /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) Overwrite dataset info from restored data version if exists. 2023-11-14 02:09:47 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 2023-11-14 02:09:47 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5) 2023-11-14 02:09:47 - INFO - datasets.builder - Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5) Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 2023-11-14 02:09:47 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'. table = cls._concat_blocks(blocks, axis=0) Overwrite dataset info from restored data version if exists. 2023-11-14 02:09:48 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 2023-11-14 02:09:48 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5) 2023-11-14 02:09:48 - INFO - datasets.builder - Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5) Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 2023-11-14 02:09:48 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5 Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-af78090beb4300c1.arrow 2023-11-14 02:09:48 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityai__apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-af78090beb4300c1.arrow Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-2bfe21b70f725afe.arrow 2023-11-14 02:09:48 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-2bfe21b70f725afe.arrow 2023-11-14 02:09:48 - INFO - main - Training on the following datasets and their proportions: ['train : 285436', 'test : 500'] ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } [INFO|tokenization_utils_base.py:2022] 2023-11-14 02:09:49,077 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/tokenizer.model [INFO|tokenization_utils_base.py:2022] 2023-11-14 02:09:49,077 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2022] 2023-11-14 02:09:49,077 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2022] 2023-11-14 02:09:49,077 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/tokenizer_config.json [INFO|tokenization_utils_base.py:2022] 2023-11-14 02:09:49,077 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/tokenizer.json ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } Loading cached processed dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-b66e038ac76c07bd.arrow 2023-11-14 02:09:49 - INFO - datasets.arrowdataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-b66e038ac76c07bd.arrow Loading cached processed dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-33b9882f3c1db716.arrow 2023-11-14 02:09:49 - INFO - datasets.arrow_dataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-33b9882f3c1db716.arrow 2023-11-14 02:09:49 - INFO - main__ - Sample 58369 of the processed training set:

<|system|> You are an Information Extraction Specialist AI. When presented with dense or multifaceted content, meticulously identify, extract, and present the key pieces of information embedded within. Your responses should distill the most pertinent details, streamlining the data into a more accessible and concise format. Prioritize accuracy and clarity, ensuring that extracted information maintains its original context and significance.<|endoftext|> <|user|> There are several roles that militant groups fill with child soldiers. In many cases, children participate directly in conflict, but they can also be used for other dangerous support roles. Many are porters who carry heavy loads of ammunition or injured soldiers, while others are lookouts or cooks. Girls are often forced to be sex slaves.

Many children are forced to join military groups at a young age. Child soldiers are also easier to manipulate and force into conflict. Recruiters typically target children from troubled areas or conflict zones, likely accustomed to violence and with fewer educational or work opportunities.

This separation leaves children without any means of safety or security, so they choose to become child soldiers as a form of protection.

A child soldier is not just someone who is involved in fighting. They can also be those in other roles such as cooks, porters, messengers, human shields, spies, suicide bombers or those used for sexual exploitation. It includes children recruited and trained for military purposes, but not used in war.

Child soldiers are children (individuals under the age of 18) who are used for military purposes. According to the United Nations Convention on the Rights of the Child and international human rights law, no child under 18 may be recruited into armed forces (government military) or armed rebel groups (militias and gangs).

why do people use child soldiers?<|endoftext|> <|assistant|> People use child soldiers because they are easier to manipulate and force into conflict[2]. They can be used for many roles in a military group, such as fighting, carrying heavy loads of ammunition, being lookouts, cooks, or sex slaves[1]. Additionally, children have fewer educational or work opportunities and this separation leaves them without any means of safety or security, so they often choose to become child soldiers as a form of protection[3]. Moreover, child soldiers are not just those involved in fighting, but can also include those in other roles such as cooks, porters, messengers, human shields, spies, suicide bombers, and those used for sexual exploitation[4]. According to the United Nations Convention on the Rights of the Child and international human rights law, no child under 18 may be recruited into armed forces or armed rebel groups[5].<|endoftext|>

2023-11-14 02:09:49 - INFO - main - Sample 13112 of the processed training set:

<|system|> You are an AI assistant. Respond with accuracy and coherence.<|endoftext|> <|user|> Rewrite: Task: Research and analyze the socio-economic impact of illicit drug manufacturing on local communities in developing countries.

Rewritten:

Task: Conduct a detailed analysis of the complex interplay between illicit drug manufacturing and socio-economic factors in developing countries. Specifically, explore the economic benefits and drawbacks of drug production and the resulting social impacts on local communities, including addiction rates and crime. Use case studies from specific developing countries to inform your recommendations for effective policies aimed at mitigating the negative effects of drug production on communities. Your report should consist of an introduction, an overview of illicit drug manufacturing in developing countries, a detailed analysis of the economic and social impacts, case studies, policy recommendations, and a conclusion.<|endoftext|> <|assistant|> The task is to thoroughly examine the intricate relationship between illicit drug production and socio-economic factors in developing nations. The aim is to explore the positives and negatives of drug manufacturing on the economy and to investigate the social impacts on local communities such as addiction and criminal activity. Through case studies, a comprehensive analysis will be conducted to inform recommendations on appropriate policies aimed at reducing the negative effects of drug production. The report is expected to comprise an introduction, a comprehensive review of illicit drug manufacturing in developing countries, a thorough analysis of the economic and social impacts, case studies, policy recommendations, and a conclusion.<|endoftext|>

2023-11-14 02:09:49 - INFO - main - Sample 144194 of the processed training set:

<|system|> You are an AI assistant. Respond with accuracy and coherence.<|endoftext|> <|user|> I am looking for a way to calculate how similar each number is compared to every other number in a set. Here's an example: [61, 62, 59, 60, 63, 62, 58] would have a high "similarity score", where as [12, 96, 71, 66, 108, 23, 5, 59] would have a low "similarity score".

Is there a term for this, and how would I calculate it?<|endoftext|> <|assistant|> Yes, the term you may be looking for is "variance" or "standard deviation." Variance and standard deviation are statistical measures that indicate the dispersion or spread of a set of numbers. A smaller variance or standard deviation implies that the numbers are closer together, whereas a larger one implies the numbers are more spread out.

In your case, you can use standard deviation as a "similarity score." Lower standard deviation means higher similarity, and higher standard deviation means lower similarity.

Here's how to calculate the standard deviation:

  1. Calculate the mean (average) of the numbers in the dataset.
  2. Subtract the mean from each number and square the result.
  3. Calculate the mean of the squared differences.
  4. Take the square root of the mean of the squared differences.

Using the two arrays you gave. Here's a step-by-step example of computing the standard deviation:

Let's start with the first array [61, 62, 59, 60, 63, 62, 58]:

  1. Calculate the mean (average) of the numbers in the dataset. (61 + 62 + 59 + 60 + 63 + 62 + 58) / 7 = 425 / 7 = 60.714

  2. Subtract the mean from each number and square the result. (61 - 60.714)^2 = 0.085 (62 - 60.714)^2 = 1.647 (59 - 60.714)^2 = 2.947 (60 - 60.714)^2 = 0.510 (63 - 60.714)^2 = 5.247 (62 - 60.714)^2 = 1.647 (58 - 60.714)^2 = 7.347

  3. Calculate the mean of the squared differences. (0.085 + 1.647 + 2.947 + 0.510 + 5.247 + 1.647 + 7.347) / 7 = 18.48 / 7 = 2.640

  4. Take the square root of the mean of the squared differences. sqrt(2.640) = 1.625

The standard deviation for the first array is approximately 1.625.

Now for the second array [12, 96, 71, 66, 108, 23, 5, 59]:

  1. Calculate the mean (average) of the numbers in the dataset. (12 + 96 + 71 + 66 + 108 + 23 + 5 + 59) / 8 = 440 / 8 = 55

  2. Subtract the mean from each number and square the result. (12 - 55)^2 = 1849 (96 - 55)^2 = 1681 (71 - 55)^2 = 256 (66 - 55)^2 = 121 (108 - 55)^2 = 2809 (23 - 55)^2 = 1024 (5 - 55)^2 = 2500 (59 - 55)^2 = 16

  3. Calculate the mean of the squared differences. (1849 + 1681 + 256 + 121 + 2809 + 1024 + 2500 + 16) / 8 = 10256 / 8 = 1282

  4. Take the square root of the mean of the squared differences. sqrt(1282) = 35.818

The standard deviation for the second array is approximately 35.818.

So, the standard deviation (similarity score) for the first array is 1.625, and for the second array, it is 35.818. A lower score indicates higher similarity.<|endoftext|> <|user|> Is standart deviation the thing that shows up in graph as the shaded part of projections to show uncertainty?<|endoftext|> <|assistant|> Yes, that is correct. The standard deviation is a measure of how spread out data is. It is calculated by taking the square root of the variance. The variance is a measure of how much the data points vary from the mean.

The standard deviation is often used to show uncertainty in projections. For example, if you have a projection that shows the average temperature will be 50 degrees Fahrenheit, with a standard deviation of 5 degrees Fahrenheit, that means that the temperature could be anywhere from 45 degrees Fahrenheit to 55 degrees Fahrenheit.

The standard deviation is a useful tool for understanding uncertainty. It can help you to make decisions about how to plan for the future.<|endoftext|>

++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } ++++++++++++++++++++++++++++++++++++++ YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 1: AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), 2: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True), } 2023-11-14 02:09:53 - INFO - main - Load pretrained model neftune_noise_alpha - 5.0 training_args - 2023-11-14 02:09:53 - INFO - main - Model loaded! neftune_noise_alpha - 5.0 training_args - SFTConfig( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=apt-chat-yi-6B-sft-full, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=data/apt-chat-yi-6B-sft-full/runs/Nov14_02-09-45_6191edb408fa, logging_first_step=True, logging_nan_inf_filter=True, logging_steps=50, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_seq_length=4096, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=2, optim=adamw_torch, optim_args=None, output_dir=data/apt-chat-yi-6B-sft-full, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['wandb'], resume_from_checkpoint=None, run_name=data/apt-chat-yi-6B-sft-full, save_on_each_node=False, save_safetensors=True, save_steps=500, save_strategy=no, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=False, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:145: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an AutoModelForCausalLM or a PeftModel (if you passed a peft_config) for you. warnings.warn( [INFO|configuration_utils.py:717] 2023-11-14 02:09:53,295 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/config.json [INFO|configuration_utils.py:717] 2023-11-14 02:09:53,384 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/config.json [INFO|configuration_utils.py:777] 2023-11-14 02:09:53,386 >> Model config YiConfig { "_name_or_path": "01-ai/Yi-6B", "architectures": [ "YiForCausalLM" ], "auto_map": { "AutoConfig": "01-ai/Yi-6B--configuration_yi.YiConfig", "AutoModel": "01-ai/Yi-6B--modeling_yi.YiModel", "AutoModelForCausalLM": "01-ai/Yi-6B--modeling_yi.YiForCausalLM" }, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "Yi", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 4, "pad_token_id": 0, "rms_norm_eps": 1e-05, "rope_theta": 5000000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.35.0", "use_cache": true, "vocab_size": 64000 }

[INFO|modeling_utils.py:3121] 2023-11-14 02:09:53,499 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/model.safetensors.index.json [INFO|modeling_utils.py:1222] 2023-11-14 02:09:53,501 >> Instantiating YiForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:791] 2023-11-14 02:09:53,503 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0 }

Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.33s/it] /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.42s/it] Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.41s/it] Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.42s/it] [INFO|modeling_utils.py:3950] 2023-11-14 02:09:56,731 >> All model checkpoint weights were used when initializing YiForCausalLM.

[INFO|modeling_utils.py:3958] 2023-11-14 02:09:56,731 >> All the weights of YiForCausalLM were initialized from the model checkpoint at 01-ai/Yi-6B. If your task is similar to the task the model of the checkpoint was trained on, you can already use YiForCausalLM for predictions without further training. Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.44s/it] Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.46s/it] Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.46s/it] Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.44s/it] /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( [INFO|configuration_utils.py:751] 2023-11-14 02:09:56,841 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/generation_config.json [INFO|configuration_utils.py:791] 2023-11-14 02:09:56,842 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0 }

/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( /usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments. warnings.warn( [INFO|trainer.py:593] 2023-11-14 02:09:56,987 >> Using auto half precision backend 2023-11-14 02:09:56 - INFO - main - Train [2023-11-14 02:09:57,109] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.2, git-hash=unknown, git-branch=unknown [2023-11-14 02:09:59,652] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-11-14 02:09:59,654] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2023-11-14 02:09:59,654] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2023-11-14 02:09:59,673] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW [2023-11-14 02:09:59,673] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'> [2023-11-14 02:09:59,673] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False [2023-11-14 02:09:59,673] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer [2023-11-14 02:09:59,829] [INFO] [utils.py:802:see_memory_usage] Stage 3 initialize beginning [2023-11-14 02:09:59,830] [INFO] [utils.py:803:see_memory_usage] MA 11.35 GB Max_MA 11.42 GB CA 11.49 GB Max_CA 11 GB [2023-11-14 02:09:59,831] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2% [2023-11-14 02:09:59,832] [INFO] [stage3.py:126:init] Reduce bucket size 500,000,000 [2023-11-14 02:09:59,833] [INFO] [stage3.py:127:init] Prefetch bucket size 50,000,000 [2023-11-14 02:09:59,984] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] [2023-11-14 02:09:59,985] [INFO] [utils.py:803:see_memory_usage] MA 11.35 GB Max_MA 11.35 GB CA 11.49 GB Max_CA 11 GB [2023-11-14 02:09:59,986] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2% Parameter Offload: Total persistent parameters: 266240 in 65 params [2023-11-14 02:10:00,226] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [end] [2023-11-14 02:10:00,227] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 11.41 GB CA 11.59 GB Max_CA 12 GB [2023-11-14 02:10:00,227] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2% [2023-11-14 02:10:00,352] [INFO] [utils.py:802:see_memory_usage] Before creating fp16 partitions [2023-11-14 02:10:00,353] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 1.47 GB CA 11.59 GB Max_CA 12 GB [2023-11-14 02:10:00,353] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2% [2023-11-14 02:10:01,732] [INFO] [utils.py:802:see_memory_usage] After creating fp16 partitions: 2 [2023-11-14 02:10:01,733] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 1.47 GB CA 1.48 GB Max_CA 12 GB [2023-11-14 02:10:01,733] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 26.43 GB, percent = 1.3% [2023-11-14 02:10:01,845] [INFO] [utils.py:802:see_memory_usage] Before creating fp32 partitions [2023-11-14 02:10:01,845] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 1.47 GB CA 1.48 GB Max_CA 1 GB [2023-11-14 02:10:01,846] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 26.43 GB, percent = 1.3% [2023-11-14 02:10:01,964] [INFO] [utils.py:802:see_memory_usage] After creating fp32 partitions [2023-11-14 02:10:01,964] [INFO] [utils.py:803:see_memory_usage] MA 4.3 GB Max_MA 5.71 GB CA 5.71 GB Max_CA 6 GB [2023-11-14 02:10:01,965] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 26.43 GB, percent = 1.3% [2023-11-14 02:10:02,365] [INFO] [utils.py:802:see_memory_usage] Before initializing optimizer states [2023-11-14 02:10:02,366] [INFO] [utils.py:803:see_memory_usage] MA 4.3 GB Max_MA 4.3 GB CA 5.71 GB Max_CA 6 GB [2023-11-14 02:10:02,366] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.7 GB, percent = 1.2% [2023-11-14 02:10:02,553] [INFO] [utils.py:802:see_memory_usage] After initializing optimizer states [2023-11-14 02:10:02,554] [INFO] [utils.py:803:see_memory_usage] MA 9.94 GB Max_MA 15.59 GB CA 17.0 GB Max_CA 17 GB [2023-11-14 02:10:02,555] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.83 GB, percent = 1.2% [2023-11-14 02:10:02,555] [INFO] [stage3.py:460:_setup_for_real_optimizer] optimizer state initialized [2023-11-14 02:10:02,794] [INFO] [utils.py:802:see_memory_usage] After initializing ZeRO optimizer [2023-11-14 02:10:02,795] [INFO] [utils.py:803:see_memory_usage] MA 12.28 GB Max_MA 13.26 GB CA 17.0 GB Max_CA 17 GB [2023-11-14 02:10:02,795] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.86 GB, percent = 1.2% [2023-11-14 02:10:02,795] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW [2023-11-14 02:10:02,795] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-11-14 02:10:02,796] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None [2023-11-14 02:10:02,796] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.999), (0.9, 0.999)] [2023-11-14 02:10:02,797] [INFO] [config.py:972:print] DeepSpeedEngine configuration: [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] amp_enabled .................. False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] amp_params ................... False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] bfloat16_enabled ............. True [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_parallel_write_pipeline False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_tag_validation_enabled True [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_tag_validation_fail False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f3e8d053e50> [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] communication_data_type ...... None [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] curriculum_enabled_legacy .... False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] curriculum_params_legacy ..... False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] data_efficiency_enabled ...... False [2023-11-14 02:10:02,797] [INFO] [config.py:976:print] dataloader_drop_last ......... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] disable_allgather ............ False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] dump_state ................... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] dynamic_loss_scale_args ...... None [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_enabled ........... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_gas_boundary_resolution 1 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_layer_num ......... 0 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_max_iter .......... 100 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_stability ......... 1e-06 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_tol ............... 0.01 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_verbose ........... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] elasticity_enabled ........... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_auto_cast ............... None [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_enabled ................. False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_master_weights_and_gradients False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] global_rank .................. 0 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] grad_accum_dtype ............. None [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_accumulation_steps .. 4 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_clipping ............ 0.0 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_predivide_factor .... 1.0 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] initial_dynamic_scale ........ 1 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] load_universal_checkpoint .... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] loss_scale ................... 1.0 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] memory_breakdown ............. False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] mics_hierarchial_params_gather False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] mics_shard_size .............. -1 [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_legacy_fusion ...... False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_name ............... None [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_params ............. None [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pld_enabled .................. False [2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pld_params ................... False [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] prescale_gradients ........... False [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] scheduler_name ............... None [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] scheduler_params ............. None [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] seq_parallel_communication_data_type torch.float32 [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] sparse_attention ............. None [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] sparse_gradients_enabled ..... False [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] steps_per_print .............. inf [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] train_batch_size ............. 32 [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] train_micro_batch_size_per_gpu 1 [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] use_node_local_storage ....... False [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] wall_clock_breakdown ......... False [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] weight_quantization_config ... None [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] world_size ................... 8 [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_allow_untested_optimizer True [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_enabled ................. True [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_force_ds_cpu_optimizer .. True [2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_optimization_stage ...... 3 [2023-11-14 02:10:02,799] [INFO] [config.py:962:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 1, "gradient_accumulation_steps": 4, "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "none", "nvme_path": null }, "offload_param": { "device": "none", "nvme_path": null }, "stage3_gather_16bit_weights_on_model_save": true }, "steps_per_print": inf, "bf16": { "enabled": true }, "fp16": { "enabled": false }, "zero_allow_untested_optimizer": true } [INFO|trainer.py:1723] 2023-11-14 02:10:02,799 >> Running training [INFO|trainer.py:1724] 2023-11-14 02:10:02,799 >> Num examples = 285,436 [INFO|trainer.py:1725] 2023-11-14 02:10:02,799 >> Num Epochs = 2 [INFO|trainer.py:1726] 2023-11-14 02:10:02,799 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1729] 2023-11-14 02:10:02,799 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:1730] 2023-11-14 02:10:02,799 >> Gradient Accumulation steps = 4 [INFO|trainer.py:1731] 2023-11-14 02:10:02,799 >> Total optimization steps = 17,840 [INFO|trainer.py:1732] 2023-11-14 02:10:02,801 >> Number of trainable parameters = 6,061,035,520 [INFO|integration_utils.py:718] 2023-11-14 02:10:02,802 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true" wandb: Currently logged in as: developer-team018 (neural-network-018). Use wandb login --relogin to force relogin wandb: Tracking run with wandb version 0.16.0 wandb: Run data is saved locally in /workspace/alignment-handbook/wandb/run-20231114_021003-8xmy6gtd wandb: Run wandb offline to turn off syncing. wandb: Syncing run robust-plasma-26 wandb: ⭐️ View project at https://wandb.ai/neural-network-018/huggingface wandb: 🚀 View run at https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd 0%| | 0/17840 [00:00<?, ?it/s][WARNING|tokenization_utils_base.py:3831] 2023-11-14 02:10:23,512 >> Token indices sequence length is longer than the specified maximum sequence length for this model (6114 > 4096). Running this sequence through the model will result in indexing errors {'loss': 1.7024, 'learning_rate': 1.9999999844947046e-05, 'epoch': 0.0}
{'loss': 1.1507, 'learning_rate': 1.999961237011484e-05, 'epoch': 0.01}
{'loss': 1.0928, 'learning_rate': 1.9998449510510744e-05, 'epoch': 0.01}
{'loss': 1.0793, 'learning_rate': 1.999651151133954e-05, 'epoch': 0.02}
{'loss': 1.0867, 'learning_rate': 1.999379852284651e-05, 'epoch': 0.02}
{'loss': 1.0857, 'learning_rate': 1.999031075535873e-05, 'epoch': 0.03}
{'loss': 1.0721, 'learning_rate': 1.9986048479268788e-05, 'epoch': 0.03}
{'loss': 1.0923, 'learning_rate': 1.99810120250138e-05, 'epoch': 0.04}
{'loss': 1.0836, 'learning_rate': 1.9975201783049804e-05, 'epoch': 0.04}
{'loss': 1.0769, 'learning_rate': 1.9968618203821487e-05, 'epoch': 0.05}
{'loss': 1.0574, 'learning_rate': 1.9961261797727256e-05, 'epoch': 0.06}
{'loss': 1.042, 'learning_rate': 1.9953133135079686e-05, 'epoch': 0.06}
{'loss': 1.0554, 'learning_rate': 1.9944232846061284e-05, 'epoch': 0.07}
{'loss': 1.0735, 'learning_rate': 1.993456162067566e-05, 'epoch': 0.07}
{'loss': 1.0785, 'learning_rate': 1.992412020869401e-05, 'epoch': 0.08}
{'loss': 1.0654, 'learning_rate': 1.9912909419596993e-05, 'epoch': 0.08}
{'loss': 1.0606, 'learning_rate': 1.9900930122511993e-05, 'epoch': 0.09}
{'loss': 1.0664, 'learning_rate': 1.988818324614572e-05, 'epoch': 0.1}
{'loss': 1.0604, 'learning_rate': 1.9874669778712215e-05, 'epoch': 0.1}
{'loss': 1.0674, 'learning_rate': 1.9860390767856244e-05, 'epoch': 0.11}
{'loss': 1.042, 'learning_rate': 1.984534732057208e-05, 'epoch': 0.11}
{'loss': 1.0452, 'learning_rate': 1.9829540603117667e-05, 'epoch': 0.12}
{'loss': 1.0577, 'learning_rate': 1.9812971840924222e-05, 'epoch': 0.12}
{'loss': 1.0471, 'learning_rate': 1.979564231850122e-05, 'epoch': 0.13}
{'loss': 1.0704, 'learning_rate': 1.977755337933682e-05, 'epoch': 0.13}
{'loss': 1.0282, 'learning_rate': 1.9758706425793702e-05, 'epoch': 0.14}
{'loss': 1.0515, 'learning_rate': 1.973910291900036e-05, 'epoch': 0.15}
{'loss': 1.0548, 'learning_rate': 1.97187443787378e-05, 'epoch': 0.15}
8%|██▌ | 1368/17840 [1:50:57<19:16:41, 4.21s/it][INFO|trainer.py:3158] 2023-11-14 04:01:02,181 >> Running Evaluation [INFO|trainer.py:3160] 2023-11-14 04:01:02,182 >> Num examples = 500 [INFO|trainer.py:3163] 2023-11-14 04:01:02,182 >> Batch size = 1

0%| | 0/63 [00:00<?, ?it/s] 3%|█▍ | 2/63 [00:00<00:12, 5.02it/s] 5%|██ | 3/63 [00:00<00:12, 4.76it/s] 6%|██▊ | 4/63 [00:00<00:15, 3.89it/s] 8%|███▍ | 5/63 [00:01<00:16, 3.50it/s] 10%|████▏ | 6/63 [00:01<00:17, 3.30it/s] 11%|████▉ | 7/63 [00:01<00:17, 3.20it/s] 13%|█████▌ | 8/63 [00:02<00:17, 3.12it/s]

                                                                         {'eval_loss': 1.0247304439544678, 'eval_runtime': 4.5889, 'eval_samples_per_second': 108.959, 'eval_steps_per_second': 13.729, 'epoch': 0.15}

8%|██▌ | 1368/17840 [1:51:02<19:16:41, 4.21s/it] 14%|██████▎ | 9/63 [00:02<00:17, 3.14it/s] {'loss': 0.9636, 'learning_rate': 1.9697632383321755e-05, 'epoch': 1.0}
{'loss': 0.9026, 'learning_rate': 1.96757685694803e-05, 'epoch': 1.01}
{'loss': 0.8808, 'learning_rate': 1.965315463222695e-05, 'epoch': 1.01}
{'loss': 0.8712, 'learning_rate': 1.9629792324729302e-05, 'epoch': 1.02}
{'loss': 0.8967, 'learning_rate': 1.960568345817306e-05, 'epoch': 1.03}
{'loss': 0.8676, 'learning_rate': 1.9580829901621666e-05, 'epoch': 1.03}
{'loss': 0.8723, 'learning_rate': 1.9555233581871366e-05, 'epoch': 1.04}
{'loss': 0.9122, 'learning_rate': 1.9528896483301866e-05, 'epoch': 1.04}
{'loss': 0.8687, 'learning_rate': 1.9501820647722458e-05, 'epoch': 1.05}
{'loss': 0.8726, 'learning_rate': 1.947400817421375e-05, 'epoch': 1.05}
{'loss': 0.8505, 'learning_rate': 1.944546121896493e-05, 'epoch': 1.06}
{'loss': 0.8458, 'learning_rate': 1.9416181995106585e-05, 'epoch': 1.07}
{'loss': 0.8721, 'learning_rate': 1.9386172772539162e-05, 'epoch': 1.07}
{'loss': 0.8676, 'learning_rate': 1.9355435877756957e-05, 'epoch': 1.08}
{'loss': 0.8826, 'learning_rate': 1.9323973693667762e-05, 'epoch': 1.08}
{'loss': 0.8607, 'learning_rate': 1.929178865940815e-05, 'epoch': 1.09}
{'loss': 0.8561, 'learning_rate': 1.925888327015434e-05, 'epoch': 1.09}
{'loss': 0.8687, 'learning_rate': 1.9225260076928783e-05, 'epoch': 1.1}
{'loss': 0.874, 'learning_rate': 1.919092168640239e-05, 'epoch': 1.1}
{'loss': 0.8563, 'learning_rate': 1.915587076069243e-05, 'epoch': 1.11}
{'loss': 0.8445, 'learning_rate': 1.9120110017156172e-05, 'epoch': 1.12}
{'loss': 0.8646, 'learning_rate': 1.908364222818019e-05, 'epoch': 1.12}
{'loss': 0.8479, 'learning_rate': 1.9046470220965457e-05, 'epoch': 1.13}
{'loss': 0.8788, 'learning_rate': 1.9008596877308157e-05, 'epoch': 1.13}
{'loss': 0.9, 'learning_rate': 1.8970025133376252e-05, 'epoch': 1.14}
{'loss': 0.8791, 'learning_rate': 1.893075797948188e-05, 'epoch': 1.14}
{'loss': 0.9254, 'learning_rate': 1.889079845984951e-05, 'epoch': 1.15}
15%|█████ | 2736/17840 [3:42:25<17:42:31, 4.22s/it][INFO|trainer.py:3158] 2023-11-14 05:52:30,316 >> Running Evaluation [INFO|trainer.py:3160] 2023-11-14 05:52:30,317 >> Num examples = 500 [INFO|trainer.py:3163] 2023-11-14 05:52:30,317 >> Batch size = 1

0%| | 0/63 [00:00<?, ?it/s] 3%|█▍ | 2/63 [00:00<00:10, 6.07it/s] 5%|██ | 3/63 [00:00<00:14, 4.20it/s] 6%|██▊ | 4/63 [00:01<00:16, 3.63it/s] 8%|███▍ | 5/63 [00:01<00:17, 3.37it/s] 10%|████▏ | 6/63 [00:01<00:17, 3.23it/s] 11%|████▉ | 7/63 [00:02<00:17, 3.16it/s] 13%|█████▌ | 8/63 [00:02<00:17, 3.06it/s]

{'eval_loss': 1.0676991939544678, 'eval_runtime': 4.5191, 'eval_samples_per_second': 110.641, 'eval_steps_per_second': 13.941, 'epoch': 1.15} 15%|█████ | 2736/17840 [3:42:30<17:42:31, 4.22s/it] 14%|██████▎ | 9/63 [00:02<00:17, 3.09it/s] [INFO|trainer.py:1955] 2023-11-14 05:52:34,837 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 13352.0365, 'train_samples_per_second': 42.755, 'train_steps_per_second': 1.336, 'train_loss': 0.9719247023264567, 'epoch': 1.15} 15%|█████ | 2736/17840 [3:42:30<20:28:20, 4.88s/it] train metrics epoch = 1.15 train_loss = 0.9719 train_runtime = 3:42:32.03 train_samples = 285436 train_samples_per_second = 42.755 train_steps_per_second = 1.336 2023-11-14 05:52:34 - INFO - main - Evaluate [INFO|trainer.py:3158] 2023-11-14 05:52:34,843 >> Running Evaluation [INFO|trainer.py:3160] 2023-11-14 05:52:34,843 >> Num examples = 500 [INFO|trainer.py:3163] 2023-11-14 05:52:34,844 >> Batch size = 1 14%|██████▎ | 9/63 [00:02<00:16, 3.23it/s] eval metrics epoch = 1.15 eval_loss = 1.0677 eval_runtime = 0:00:04.48 eval_samples = 500 eval_samples_per_second = 111.451 eval_steps_per_second = 14.043 2023-11-14 05:52:39 - INFO - main - Save model [INFO|trainer.py:2881] 2023-11-14 05:52:43,590 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full [INFO|configuration_utils.py:461] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json [INFO|configuration_utils.py:564] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json [INFO|modeling_utils.py:2201] 2023-11-14 05:52:51,334 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2428] 2023-11-14 05:52:51,336 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-11-14 05:52:51,337 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json [INFO|trainer.py:2881] 2023-11-14 05:52:55,599 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full [INFO|configuration_utils.py:461] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json [INFO|configuration_utils.py:564] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json [INFO|modeling_utils.py:2201] 2023-11-14 05:53:06,302 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2428] 2023-11-14 05:53:06,303 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-11-14 05:53:06,304 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json model-00001-of-00003.safetensors: 0%| | 0.00/4.93G [00:00<?, ?B/s] model-00002-of-00003.safetensors: 0%| | 0.00/4.98G [00:00<?, ?B/s]

Upload 4 LFS files: 0%| | 0/4 [00:00<?, ?it/s]

model-00003-of-00003.safetensors: 0%| | 0.00/2.21G [00:00<?, ?B/s]

training_args.bin: 0%| | 0.00/5.62k [00:00<?, ?B/s] model-00002-of-00003.safetensors: 0%| | 8.19k/4.98G [00:00<41:03:55, 33.7kB/s]

model-00001-of-00003.safetensors: 0%| | 8.19k/4.93G [00:00<41:55:35, 32.7kB/s]

training_args.bin: 100%|███████████████████| 5.62k/5.62k [00:00<00:00, 23.8kB/s] training_args.bin: 100%|███████████████████| 5.62k/5.62k [00:00<00:00, 15.4kB/s]

model-00003-of-00003.safetensors: 0%| | 2.30M/2.21G [00:00<05:02, 7.30MB/s] model-00001-of-00003.safetensors: 0%| | 8.19M/4.93G [00:00<04:03, 20.2MB/s]

model-00003-of-00003.safetensors: 0%| | 3.22M/2.21G [00:00<04:58, 7.41MB/s] model-00002-of-00003.safetensors: 0%| | 7.22M/4.98G [00:00<04:34, 18.1MB/s]

model-00003-of-00003.safetensors: 0%| | 6.42M/2.21G [00:00<02:37, 14.0MB/s]

model-00001-of-00003.safetensors: 0%| | 16.0M/4.93G [00:00<04:29, 18.2MB/s] model-00001-of-00003.safetensors: 1%| | 30.2M/4.93G [00:01<02:04, 39.5MB/s]

model-00003-of-00003.safetensors: 1%| | 16.0M/2.21G [00:01<02:05, 17.5MB/s] model-00002-of-00003.safetensors: 0%| | 20.1M/4.98G [00:01<03:45, 21.9MB/s] model-00002-of-00003.safetensors: 0%| | 23.1M/4.98G [00:01<03:39, 22.5MB/s]

model-00003-of-00003.safetensors: 1%| | 20.2M/2.21G [00:01<01:59, 18.3MB/s] model-00002-of-00003.safetensors: 1%| | 27.0M/4.98G [00:01<03:11, 25.8MB/s]

model-00001-of-00003.safetensors: 1%| | 36.7M/4.93G [00:01<03:09, 25.8MB/s]

model-00001-of-00003.safetensors: 1%| | 41.5M/4.93G [00:01<03:01, 26.9MB/s] model-00002-of-00003.safetensors: 1%| | 32.0M/4.98G [00:01<04:09, 19.8MB/s] model-00002-of-00003.safetensors: 1%| | 36.2M/4.98G [00:01<03:42, 22.2MB/s]

model-00003-of-00003.safetensors: 1%| | 32.0M/2.21G [00:01<01:55, 18.8MB/s] model-00002-of-00003.safetensors: 1%| | 38.8M/4.98G [00:01<03:37, 22.7MB/s]

model-00003-of-00003.safetensors: 2%| | 39.9M/2.21G [00:01<01:16, 28.3MB/s] model-00002-of-00003.safetensors: 1%| | 41.8M/4.98G [00:02<03:26, 23.9MB/s]

model-00001-of-00003.safetensors: 1%| | 48.0M/4.93G [00:02<03:56, 20.7MB/s]

model-00003-of-00003.safetensors: 2%| | 50.9M/2.21G [00:02<01:30, 23.8MB/s] model-00002-of-00003.safetensors: 1%| | 48.0M/4.98G [00:02<04:32, 18.1MB/s] model-00001-of-00003.safetensors: 1%| | 64.0M/4.93G [00:02<03:16, 24.7MB/s]

model-00003-of-00003.safetensors: 2%| | 54.5M/2.21G [00:02<01:54, 18.9MB/s] model-00002-of-00003.safetensors: 1%| | 55.0M/4.98G [00:02<03:59, 20.5MB/s]

model-00003-of-00003.safetensors: 3%| | 57.7M/2.21G [00:02<01:44, 20.6MB/s] model-00001-of-00003.safetensors: 2%| | 80.0M/4.93G [00:02<02:32, 31.8MB/s]

model-00003-of-00003.safetensors: 3%| | 64.0M/2.21G [00:03<01:52, 19.1MB/s]

model-00003-of-00003.safetensors: 3%| | 67.3M/2.21G [00:03<01:43, 20.7MB/s] model-00001-of-00003.safetensors: 2%| | 102M/4.93G [00:03<02:10, 37.0MB/s] model-00002-of-00003.safetensors: 1%| | 68.5M/4.98G [00:03<04:06, 19.9MB/s]

model-00003-of-00003.safetensors: 3%|▏ | 69.9M/2.21G [00:03<01:49, 19.5MB/s] model-00002-of-00003.safetensors: 1%| | 71.6M/4.98G [00:03<03:51, 21.2MB/s]

model-00001-of-00003.safetensors: 2%| | 106M/4.93G [00:03<02:16, 35.4MB/s] model-00002-of-00003.safetensors: 2%| | 76.7M/4.98G [00:03<03:04, 26.6MB/s]

model-00003-of-00003.safetensors: 4%|▏ | 80.0M/2.21G [00:03<01:44, 20.5MB/s] model-00002-of-00003.safetensors: 2%| | 80.0M/4.98G [00:04<04:41, 17.4MB/s]

model-00003-of-00003.safetensors: 4%|▏ | 84.4M/2.21G [00:04<01:41, 20.9MB/s] model-00001-of-00003.safetensors: 2%| | 112M/4.93G [00:04<03:45, 21.4MB/s]

model-00001-of-00003.safetensors: 3%|▏ | 127M/4.93G [00:04<02:17, 35.0MB/s]

model-00003-of-00003.safetensors: 4%|▏ | 95.6M/2.21G [00:04<01:03, 33.4MB/s] model-00002-of-00003.safetensors: 2%| | 87.7M/4.98G [00:04<04:08, 19.6MB/s] model-00001-of-00003.safetensors: 3%|▏ | 134M/4.93G [00:04<02:26, 32.7MB/s]

model-00001-of-00003.safetensors: 3%|▏ | 144M/4.93G [00:04<02:20, 34.1MB/s] model-00002-of-00003.safetensors: 2%| | 96.6M/4.98G [00:04<04:55, 16.5MB/s]

model-00001-of-00003.safetensors: 3%|▏ | 151M/4.93G [00:05<02:15, 35.4MB/s] model-00002-of-00003.safetensors: 2%| | 101M/4.98G [00:05<04:11, 19.4MB/s]

model-00003-of-00003.safetensors: 5%|▎ | 117M/2.21G [00:05<01:07, 31.1MB/s] model-00001-of-00003.safetensors: 3%|▏ | 155M/4.93G [00:05<02:18, 34.5MB/s] model-00001-of-00003.safetensors: 3%|▏ | 160M/4.93G [00:05<02:14, 35.5MB/s]

model-00003-of-00003.safetensors: 5%|▎ | 121M/2.21G [00:05<01:17, 26.8MB/s]

model-00003-of-00003.safetensors: 6%|▎ | 127M/2.21G [00:05<01:04, 32.4MB/s] model-00001-of-00003.safetensors: 3%|▏ | 164M/4.93G [00:05<03:35, 22.1MB/s] model-00002-of-00003.safetensors: 3%|▏ | 127M/4.98G [00:05<02:10, 37.3MB/s]

model-00001-of-00003.safetensors: 4%|▏ | 176M/4.93G [00:06<03:11, 24.8MB/s] model-00002-of-00003.safetensors: 3%|▏ | 133M/4.98G [00:06<02:57, 27.3MB/s]

model-00001-of-00003.safetensors: 4%|▏ | 181M/4.93G [00:06<02:54, 27.2MB/s] model-00001-of-00003.safetensors: 4%|▏ | 185M/4.93G [00:06<03:00, 26.2MB/s] model-00002-of-00003.safetensors: 3%|▏ | 143M/4.98G [00:06<02:43, 29.5MB/s]

model-00001-of-00003.safetensors: 4%|▏ | 191M/4.93G [00:06<02:27, 32.2MB/s]

model-00003-of-00003.safetensors: 7%|▎ | 166M/2.21G [00:06<01:00, 34.1MB/s]

model-00003-of-00003.safetensors: 8%|▍ | 170M/2.21G [00:06<01:00, 33.6MB/s]

model-00003-of-00003.safetensors: 8%|▍ | 176M/2.21G [00:06<00:53, 38.2MB/s] model-00001-of-00003.safetensors: 4%|▏ | 195M/4.93G [00:07<04:03, 19.5MB/s]

model-00003-of-00003.safetensors: 8%|▍ | 180M/2.21G [00:07<01:21, 25.0MB/s] model-00002-of-00003.safetensors: 3%|▏ | 160M/4.98G [00:07<03:19, 24.1MB/s]

model-00001-of-00003.safetensors: 4%|▏ | 208M/4.93G [00:07<03:34, 22.1MB/s]

model-00003-of-00003.safetensors: 8%|▍ | 187M/2.21G [00:07<01:19, 25.4MB/s] model-00001-of-00003.safetensors: 5%|▏ | 231M/4.93G [00:08<02:25, 32.4MB/s]

model-00001-of-00003.safetensors: 5%|▏ | 234M/4.93G [00:08<02:25, 32.2MB/s]

model-00001-of-00003.safetensors: 5%|▏ | 239M/4.93G [00:08<02:16, 34.4MB/s] model-00002-of-00003.safetensors: 4%|▏ | 192M/4.98G [00:08<02:40, 29.7MB/s]

model-00003-of-00003.safetensors: 9%|▍ | 199M/2.21G [00:08<01:33, 21.6MB/s]

model-00003-of-00003.safetensors: 9%|▍ | 202M/2.21G [00:08<01:27, 23.0MB/s] model-00001-of-00003.safetensors: 5%|▏ | 243M/4.93G [00:08<03:43, 21.0MB/s]

model-00001-of-00003.safetensors: 5%|▎ | 256M/4.93G [00:09<03:01, 25.8MB/s]

model-00001-of-00003.safetensors: 6%|▎ | 272M/4.93G [00:09<02:11, 35.4MB/s] model-00002-of-00003.safetensors: 5%|▏ | 224M/4.98G [00:09<02:58, 26.6MB/s]

model-00001-of-00003.safetensors: 6%|▎ | 288M/4.93G [00:09<02:03, 37.5MB/s]

model-00003-of-00003.safetensors: 12%|▌ | 256M/2.21G [00:09<00:48, 40.5MB/s] model-00001-of-00003.safetensors: 6%|▎ | 311M/4.93G [00:10<01:41, 45.6MB/s]

model-00003-of-00003.safetensors: 12%|▌ | 272M/2.21G [00:10<00:43, 44.1MB/s]

model-00001-of-00003.safetensors: 6%|▎ | 317M/4.93G [00:10<01:53, 40.6MB/s]

model-00003-of-00003.safetensors: 13%|▋ | 288M/2.21G [00:10<00:39, 48.7MB/s] model-00001-of-00003.safetensors: 7%|▎ | 321M/4.93G [00:10<02:58, 25.9MB/s]

model-00003-of-00003.safetensors: 13%|▋ | 293M/2.21G [00:10<01:02, 30.8MB/s] model-00001-of-00003.safetensors: 7%|▎ | 336M/4.93G [00:11<02:24, 31.8MB/s]

model-00003-of-00003.safetensors: 14%|▋ | 304M/2.21G [00:11<00:59, 32.1MB/s] model-00001-of-00003.safetensors: 7%|▎ | 352M/4.93G [00:11<02:02, 37.3MB/s] model-00002-of-00003.safetensors: 6%|▎ | 293M/4.98G [00:11<02:16, 34.3MB/s]

model-00003-of-00003.safetensors: 14%|▋ | 320M/2.21G [00:11<00:51, 36.6MB/s] model-00002-of-00003.safetensors: 6%|▎ | 297M/4.98G [00:11<02:19, 33.4MB/s] model-00001-of-00003.safetensors: 7%|▎ | 368M/4.93G [00:11<01:46, 42.8MB/s]

model-00001-of-00003.safetensors: 8%|▍ | 384M/4.93G [00:12<01:37, 46.7MB/s]

model-00001-of-00003.safetensors: 8%|▍ | 400M/4.93G [00:12<01:30, 50.0MB/s] model-00001-of-00003.safetensors: 8%|▍ | 416M/4.93G [00:12<01:28, 50.9MB/s]

model-00003-of-00003.safetensors: 17%|▊ | 368M/2.21G [00:12<00:42, 43.4MB/s]

model-00003-of-00003.safetensors: 17%|▊ | 384M/2.21G [00:12<00:38, 47.8MB/s] model-00001-of-00003.safetensors: 9%|▍ | 432M/4.93G [00:13<01:33, 48.1MB/s]

model-00003-of-00003.safetensors: 18%|▉ | 400M/2.21G [00:13<00:37, 48.5MB/s] model-00001-of-00003.safetensors: 9%|▍ | 448M/4.93G [00:13<01:32, 48.6MB/s]

model-00003-of-00003.safetensors: 19%|▉ | 412M/2.21G [00:13<00:32, 56.2MB/s] model-00002-of-00003.safetensors: 7%|▎ | 341M/4.98G [00:13<02:22, 32.4MB/s] model-00001-of-00003.safetensors: 9%|▍ | 464M/4.93G [00:13<01:31, 48.7MB/s]

model-00001-of-00003.safetensors: 10%|▍ | 480M/4.93G [00:13<01:28, 50.6MB/s] model-00002-of-00003.safetensors: 7%|▎ | 352M/4.98G [00:14<03:38, 21.1MB/s]

model-00001-of-00003.safetensors: 10%|▌ | 496M/4.93G [00:14<01:31, 48.6MB/s] model-00002-of-00003.safetensors: 7%|▎ | 368M/4.98G [00:14<02:42, 28.4MB/s]

model-00003-of-00003.safetensors: 20%|█ | 448M/2.21G [00:14<00:44, 39.8MB/s] model-00002-of-00003.safetensors: 8%|▍ | 384M/4.98G [00:14<02:03, 37.1MB/s]

model-00003-of-00003.safetensors: 21%|█ | 464M/2.21G [00:14<00:41, 42.5MB/s] model-00002-of-00003.safetensors: 8%|▍ | 400M/4.98G [00:15<01:54, 39.9MB/s]

model-00001-of-00003.safetensors: 10%|▌ | 512M/4.93G [00:15<02:30, 29.3MB/s] model-00002-of-00003.safetensors: 8%|▍ | 416M/4.98G [00:15<01:47, 42.3MB/s]

model-00001-of-00003.safetensors: 11%|▌ | 528M/4.93G [00:15<02:15, 32.5MB/s] model-00001-of-00003.safetensors: 11%|▌ | 544M/4.93G [00:16<02:03, 35.5MB/s] model-00002-of-00003.safetensors: 9%|▍ | 448M/4.98G [00:16<01:46, 42.4MB/s] model-00002-of-00003.safetensors: 9%|▍ | 464M/4.98G [00:16<01:45, 42.8MB/s]

model-00001-of-00003.safetensors: 11%|▌ | 560M/4.93G [00:16<02:10, 33.4MB/s]

model-00001-of-00003.safetensors: 12%|▌ | 576M/4.93G [00:16<01:59, 36.3MB/s] model-00002-of-00003.safetensors: 10%|▍ | 480M/4.98G [00:17<01:56, 38.6MB/s]

model-00001-of-00003.safetensors: 12%|▌ | 592M/4.93G [00:17<01:51, 39.1MB/s] model-00002-of-00003.safetensors: 10%|▍ | 496M/4.98G [00:17<01:54, 39.2MB/s]

model-00001-of-00003.safetensors: 12%|▌ | 608M/4.93G [00:17<01:46, 40.7MB/s] model-00002-of-00003.safetensors: 10%|▌ | 512M/4.98G [00:17<01:46, 42.1MB/s] model-00002-of-00003.safetensors: 10%|▌ | 519M/4.98G [00:17<01:47, 41.6MB/s]

model-00001-of-00003.safetensors: 13%|▋ | 624M/4.93G [00:17<01:38, 43.9MB/s] model-00002-of-00003.safetensors: 11%|▌ | 523M/4.98G [00:18<01:49, 40.6MB/s] model-00001-of-00003.safetensors: 13%|▋ | 640M/4.93G [00:18<01:34, 45.4MB/s]

model-00003-of-00003.safetensors: 27%|█▎ | 592M/2.21G [00:18<00:40, 40.3MB/s] model-00001-of-00003.safetensors: 13%|▋ | 656M/4.93G [00:18<01:29, 47.5MB/s]

model-00003-of-00003.safetensors: 27%|█▎ | 608M/2.21G [00:18<00:36, 43.4MB/s] model-00001-of-00003.safetensors: 14%|▋ | 672M/4.93G [00:18<01:28, 48.1MB/s]

model-00003-of-00003.safetensors: 28%|█▍ | 624M/2.21G [00:18<00:35, 44.9MB/s] model-00002-of-00003.safetensors: 11%|▌ | 560M/4.98G [00:19<02:05, 35.1MB/s]

model-00001-of-00003.safetensors: 14%|▋ | 688M/4.93G [00:19<01:32, 46.1MB/s]

model-00003-of-00003.safetensors: 30%|█▍ | 656M/2.21G [00:19<00:33, 46.6MB/s] model-00001-of-00003.safetensors: 14%|▋ | 704M/4.93G [00:19<01:30, 46.9MB/s] model-00001-of-00003.safetensors: 15%|▋ | 720M/4.93G [00:19<01:27, 47.9MB/s]

model-00003-of-00003.safetensors: 30%|█▌ | 672M/2.21G [00:20<00:37, 41.0MB/s] model-00001-of-00003.safetensors: 15%|▋ | 736M/4.93G [00:20<01:30, 46.5MB/s]

model-00003-of-00003.safetensors: 31%|█▌ | 688M/2.21G [00:20<00:33, 44.9MB/s] model-00002-of-00003.safetensors: 13%|▋ | 624M/4.98G [00:20<01:28, 49.0MB/s]

model-00003-of-00003.safetensors: 32%|█▌ | 704M/2.21G [00:20<00:31, 47.4MB/s] model-00001-of-00003.safetensors: 15%|▊ | 752M/4.93G [00:20<01:51, 37.6MB/s]

model-00003-of-00003.safetensors: 33%|█▋ | 720M/2.21G [00:21<00:31, 48.0MB/s] model-00001-of-00003.safetensors: 16%|▊ | 768M/4.93G [00:21<01:46, 39.0MB/s]

model-00003-of-00003.safetensors: 33%|█▋ | 736M/2.21G [00:21<00:31, 46.4MB/s] model-00001-of-00003.safetensors: 16%|▊ | 784M/4.93G [00:21<01:40, 41.4MB/s]

model-00001-of-00003.safetensors: 16%|▊ | 800M/4.93G [00:22<01:42, 40.5MB/s] model-00002-of-00003.safetensors: 14%|▋ | 688M/4.98G [00:22<01:45, 40.6MB/s]

model-00003-of-00003.safetensors: 35%|█▋ | 768M/2.21G [00:22<00:32, 44.6MB/s] model-00002-of-00003.safetensors: 14%|▋ | 704M/4.98G [00:22<01:34, 45.1MB/s]

model-00001-of-00003.safetensors: 17%|▊ | 816M/4.93G [00:22<01:40, 40.8MB/s] model-00002-of-00003.safetensors: 14%|▋ | 720M/4.98G [00:22<01:28, 47.9MB/s]

model-00001-of-00003.safetensors: 17%|▊ | 832M/4.93G [00:22<01:40, 40.8MB/s] model-00002-of-00003.safetensors: 15%|▋ | 736M/4.98G [00:23<01:39, 42.6MB/s]

model-00001-of-00003.safetensors: 17%|▊ | 848M/4.93G [00:23<01:34, 43.1MB/s] model-00001-of-00003.safetensors: 18%|▉ | 864M/4.93G [00:23<01:28, 45.8MB/s]

model-00003-of-00003.safetensors: 38%|█▉ | 832M/2.21G [00:23<00:30, 46.0MB/s]

model-00003-of-00003.safetensors: 38%|█▉ | 848M/2.21G [00:23<00:28, 47.4MB/s] model-00001-of-00003.safetensors: 18%|▉ | 880M/4.93G [00:23<01:38, 41.1MB/s]

model-00003-of-00003.safetensors: 39%|█▉ | 864M/2.21G [00:24<00:26, 50.4MB/s] model-00002-of-00003.safetensors: 16%|▊ | 784M/4.98G [00:24<01:44, 40.1MB/s]

model-00003-of-00003.safetensors: 40%|█▉ | 880M/2.21G [00:24<00:25, 51.4MB/s] model-00002-of-00003.safetensors: 16%|▊ | 800M/4.98G [00:24<01:40, 41.7MB/s]

model-00001-of-00003.safetensors: 18%|▉ | 896M/4.93G [00:24<02:22, 28.2MB/s] model-00002-of-00003.safetensors: 16%|▊ | 816M/4.98G [00:24<01:34, 44.0MB/s]

model-00001-of-00003.safetensors: 18%|▉ | 912M/4.93G [00:25<02:03, 32.7MB/s] model-00002-of-00003.safetensors: 17%|▊ | 832M/4.98G [00:25<01:30, 46.0MB/s]

model-00001-of-00003.safetensors: 19%|▉ | 928M/4.93G [00:25<01:55, 34.7MB/s] model-00002-of-00003.safetensors: 17%|▊ | 848M/4.98G [00:25<01:30, 45.6MB/s]

model-00003-of-00003.safetensors: 43%|██▏ | 944M/2.21G [00:25<00:28, 43.9MB/s] model-00001-of-00003.safetensors: 19%|▉ | 944M/4.93G [00:26<01:48, 36.7MB/s]

model-00003-of-00003.safetensors: 43%|██▏ | 960M/2.21G [00:26<00:26, 46.4MB/s] model-00001-of-00003.safetensors: 19%|▉ | 960M/4.93G [00:26<01:40, 39.5MB/s]

model-00003-of-00003.safetensors: 44%|██▏ | 976M/2.21G [00:26<00:24, 49.5MB/s] model-00001-of-00003.safetensors: 20%|▉ | 976M/4.93G [00:26<01:36, 41.1MB/s]

model-00003-of-00003.safetensors: 45%|██▏ | 992M/2.21G [00:26<00:25, 47.4MB/s] model-00001-of-00003.safetensors: 20%|▊ | 1.01G/4.93G [00:27<01:24, 46.5MB/s] model-00001-of-00003.safetensors: 21%|▊ | 1.02G/4.93G [00:27<01:21, 47.8MB/s] model-00001-of-00003.safetensors: 21%|▊ | 1.04G/4.93G [00:28<01:27, 44.7MB/s] model-00002-of-00003.safetensors: 19%|▉ | 960M/4.98G [00:28<01:26, 46.2MB/s] model-00001-of-00003.safetensors: 22%|▊ | 1.06G/4.93G [00:28<01:31, 42.2MB/s] model-00001-of-00003.safetensors: 22%|▊ | 1.07G/4.93G [00:28<01:44, 36.9MB/s] model-00002-of-00003.safetensors: 20%|▊ | 1.01G/4.98G [00:28<01:20, 49.5MB/s]

model-00003-of-00003.safetensors: 46%|█▊ | 1.01G/2.21G [00:29<01:10, 17.0MB/s] model-00001-of-00003.safetensors: 22%|▊ | 1.07G/4.93G [00:29<02:47, 23.0MB/s]

model-00001-of-00003.safetensors: 22%|▉ | 1.09G/4.93G [00:29<02:02, 31.4MB/s] model-00001-of-00003.safetensors: 22%|▉ | 1.10G/4.93G [00:29<01:46, 35.9MB/s] model-00002-of-00003.safetensors: 21%|▊ | 1.06G/4.98G [00:30<01:24, 46.5MB/s]

model-00001-of-00003.safetensors: 23%|▉ | 1.12G/4.93G [00:30<01:33, 40.9MB/s] model-00001-of-00003.safetensors: 23%|▉ | 1.14G/4.93G [00:30<01:25, 44.4MB/s] model-00002-of-00003.safetensors: 22%|▊ | 1.09G/4.98G [00:30<01:19, 48.9MB/s]

model-00001-of-00003.safetensors: 23%|▉ | 1.15G/4.93G [00:30<01:23, 45.3MB/s] model-00002-of-00003.safetensors: 22%|▉ | 1.10G/4.98G [00:31<01:22, 46.9MB/s]

model-00001-of-00003.safetensors: 24%|▉ | 1.17G/4.93G [00:31<01:22, 45.6MB/s] model-00002-of-00003.safetensors: 23%|▉ | 1.12G/4.98G [00:31<01:18, 49.4MB/s] model-00002-of-00003.safetensors: 23%|▉ | 1.14G/4.98G [00:31<01:11, 53.8MB/s]

model-00001-of-00003.safetensors: 24%|▉ | 1.18G/4.93G [00:31<01:25, 43.7MB/s] model-00002-of-00003.safetensors: 23%|▉ | 1.15G/4.98G [00:31<01:13, 52.0MB/s]

model-00001-of-00003.safetensors: 24%|▉ | 1.20G/4.93G [00:32<01:27, 42.9MB/s]

model-00003-of-00003.safetensors: 51%|██ | 1.12G/2.21G [00:32<00:29, 36.5MB/s] model-00001-of-00003.safetensors: 25%|▉ | 1.22G/4.93G [00:32<01:30, 41.2MB/s] model-00002-of-00003.safetensors: 24%|▉ | 1.18G/4.98G [00:32<01:20, 47.3MB/s]

model-00001-of-00003.safetensors: 25%|▉ | 1.23G/4.93G [00:32<01:24, 43.7MB/s]

model-00001-of-00003.safetensors: 25%|█ | 1.25G/4.93G [00:33<01:23, 44.1MB/s] model-00002-of-00003.safetensors: 24%|▉ | 1.20G/4.98G [00:33<01:46, 35.5MB/s]

model-00001-of-00003.safetensors: 26%|█ | 1.26G/4.93G [00:33<01:20, 45.3MB/s] model-00002-of-00003.safetensors: 24%|▉ | 1.22G/4.98G [00:33<01:38, 38.1MB/s]

model-00001-of-00003.safetensors: 26%|█ | 1.28G/4.93G [00:33<01:25, 42.8MB/s]

model-00001-of-00003.safetensors: 26%|█ | 1.30G/4.93G [00:34<01:25, 42.4MB/s] model-00002-of-00003.safetensors: 25%|▉ | 1.23G/4.98G [00:34<02:02, 30.6MB/s]

model-00001-of-00003.safetensors: 27%|█ | 1.31G/4.93G [00:34<01:23, 43.6MB/s] model-00002-of-00003.safetensors: 25%|█ | 1.25G/4.98G [00:34<01:48, 34.4MB/s]

model-00001-of-00003.safetensors: 27%|█ | 1.33G/4.93G [00:34<01:16, 47.0MB/s]

model-00003-of-00003.safetensors: 56%|██▎ | 1.25G/2.21G [00:35<00:19, 49.2MB/s] model-00002-of-00003.safetensors: 25%|█ | 1.26G/4.98G [00:35<01:40, 36.9MB/s]

model-00003-of-00003.safetensors: 57%|██▎ | 1.26G/2.21G [00:35<00:19, 49.7MB/s] model-00001-of-00003.safetensors: 27%|█ | 1.34G/4.93G [00:35<01:30, 39.7MB/s]

model-00001-of-00003.safetensors: 28%|█ | 1.36G/4.93G [00:35<01:20, 44.3MB/s] model-00001-of-00003.safetensors: 28%|█ | 1.38G/4.93G [00:36<01:16, 46.4MB/s]

model-00003-of-00003.safetensors: 59%|██▎ | 1.30G/2.21G [00:36<00:18, 48.6MB/s] model-00002-of-00003.safetensors: 26%|█ | 1.31G/4.98G [00:36<01:29, 41.1MB/s]

model-00003-of-00003.safetensors: 59%|██▎ | 1.31G/2.21G [00:36<00:19, 46.6MB/s] model-00001-of-00003.safetensors: 28%|█▏ | 1.39G/4.93G [00:36<01:25, 41.2MB/s]

model-00001-of-00003.safetensors: 29%|█▏ | 1.41G/4.93G [00:36<01:21, 43.4MB/s] model-00002-of-00003.safetensors: 27%|█ | 1.34G/4.98G [00:36<01:24, 42.9MB/s]

model-00001-of-00003.safetensors: 29%|█▏ | 1.42G/4.93G [00:37<01:16, 45.7MB/s] model-00002-of-00003.safetensors: 27%|█ | 1.36G/4.98G [00:37<01:21, 44.5MB/s]

model-00001-of-00003.safetensors: 29%|█▏ | 1.44G/4.93G [00:37<01:14, 47.0MB/s] model-00002-of-00003.safetensors: 28%|█ | 1.38G/4.98G [00:37<01:22, 43.8MB/s]

model-00001-of-00003.safetensors: 30%|█▏ | 1.46G/4.93G [00:37<01:11, 48.5MB/s] model-00002-of-00003.safetensors: 28%|█ | 1.39G/4.98G [00:37<01:19, 45.0MB/s]

model-00003-of-00003.safetensors: 63%|██▌ | 1.39G/2.21G [00:38<00:18, 44.3MB/s] model-00002-of-00003.safetensors: 28%|█▏ | 1.41G/4.98G [00:38<01:15, 47.6MB/s]

model-00001-of-00003.safetensors: 30%|█▏ | 1.47G/4.93G [00:38<01:39, 34.6MB/s] model-00001-of-00003.safetensors: 30%|█▏ | 1.49G/4.93G [00:38<01:28, 38.8MB/s]

model-00003-of-00003.safetensors: 64%|██▌ | 1.42G/2.21G [00:38<00:16, 46.5MB/s] model-00001-of-00003.safetensors: 30%|█▏ | 1.50G/4.93G [00:39<01:25, 40.0MB/s]

model-00003-of-00003.safetensors: 65%|██▌ | 1.44G/2.21G [00:39<00:17, 45.0MB/s] model-00002-of-00003.safetensors: 29%|█▏ | 1.46G/4.98G [00:39<01:17, 45.5MB/s]

model-00001-of-00003.safetensors: 31%|█▏ | 1.52G/4.93G [00:39<01:23, 40.8MB/s] model-00002-of-00003.safetensors: 30%|█▏ | 1.47G/4.98G [00:39<01:17, 45.0MB/s]

model-00003-of-00003.safetensors: 67%|██▋ | 1.47G/2.21G [00:39<00:15, 47.2MB/s] model-00001-of-00003.safetensors: 31%|█▏ | 1.54G/4.93G [00:40<01:31, 37.2MB/s]

model-00001-of-00003.safetensors: 31%|█▎ | 1.55G/4.93G [00:40<01:20, 41.8MB/s] model-00002-of-00003.safetensors: 30%|█▏ | 1.50G/4.98G [00:40<01:26, 40.1MB/s]

model-00001-of-00003.safetensors: 32%|█▎ | 1.57G/4.93G [00:40<01:20, 41.9MB/s] model-00002-of-00003.safetensors: 31%|█▏ | 1.52G/4.98G [00:41<01:31, 37.6MB/s]

model-00001-of-00003.safetensors: 32%|█▎ | 1.58G/4.93G [00:41<01:30, 37.2MB/s] model-00002-of-00003.safetensors: 31%|█▏ | 1.54G/4.98G [00:41<01:22, 41.8MB/s]

model-00001-of-00003.safetensors: 32%|█▎ | 1.60G/4.93G [00:41<01:21, 40.8MB/s] model-00002-of-00003.safetensors: 31%|█▏ | 1.55G/4.98G [00:41<01:16, 44.6MB/s]

model-00003-of-00003.safetensors: 70%|██▊ | 1.55G/2.21G [00:41<00:15, 42.9MB/s] model-00001-of-00003.safetensors: 33%|█▎ | 1.62G/4.93G [00:41<01:17, 42.7MB/s]

model-00003-of-00003.safetensors: 71%|██▊ | 1.57G/2.21G [00:42<00:15, 42.7MB/s] model-00001-of-00003.safetensors: 33%|█▎ | 1.63G/4.93G [00:42<01:17, 42.8MB/s]

model-00001-of-00003.safetensors: 33%|█▎ | 1.65G/4.93G [00:42<01:14, 44.2MB/s] model-00002-of-00003.safetensors: 32%|█▎ | 1.60G/4.98G [00:42<01:14, 45.4MB/s]

model-00003-of-00003.safetensors: 72%|██▉ | 1.60G/2.21G [00:42<00:13, 45.8MB/s] model-00001-of-00003.safetensors: 34%|█▎ | 1.66G/4.93G [00:42<01:12, 45.0MB/s] model-00002-of-00003.safetensors: 33%|█▎ | 1.63G/4.98G [00:43<01:09, 48.0MB/s]

model-00001-of-00003.safetensors: 34%|█▎ | 1.68G/4.93G [00:43<01:11, 45.6MB/s] model-00002-of-00003.safetensors: 33%|█▎ | 1.65G/4.98G [00:43<01:09, 47.6MB/s]

model-00001-of-00003.safetensors: 34%|█▍ | 1.70G/4.93G [00:43<01:17, 41.7MB/s] model-00002-of-00003.safetensors: 33%|█▎ | 1.66G/4.98G [00:43<01:07, 49.2MB/s]

model-00001-of-00003.safetensors: 35%|█▍ | 1.71G/4.93G [00:44<01:16, 42.0MB/s] model-00002-of-00003.safetensors: 34%|█▎ | 1.68G/4.98G [00:44<01:07, 49.0MB/s]

model-00001-of-00003.safetensors: 35%|█▍ | 1.73G/4.93G [00:44<01:11, 44.9MB/s] model-00001-of-00003.safetensors: 35%|█▍ | 1.74G/4.93G [00:44<01:08, 46.4MB/s]

model-00003-of-00003.safetensors: 76%|███ | 1.68G/2.21G [00:44<00:12, 41.2MB/s] model-00001-of-00003.safetensors: 36%|█▍ | 1.76G/4.93G [00:45<01:09, 45.8MB/s]

model-00003-of-00003.safetensors: 77%|███ | 1.70G/2.21G [00:45<00:11, 43.5MB/s] model-00002-of-00003.safetensors: 35%|█▍ | 1.73G/4.98G [00:45<01:13, 44.4MB/s]

model-00001-of-00003.safetensors: 36%|█▍ | 1.78G/4.93G [00:45<01:16, 41.2MB/s] model-00001-of-00003.safetensors: 36%|█▍ | 1.79G/4.93G [00:45<01:11, 44.1MB/s] model-00002-of-00003.safetensors: 35%|█▍ | 1.76G/4.98G [00:46<01:20, 40.0MB/s]

model-00001-of-00003.safetensors: 37%|█▍ | 1.81G/4.93G [00:46<01:10, 44.2MB/s]

model-00003-of-00003.safetensors: 79%|███▏| 1.74G/2.21G [00:46<00:12, 36.7MB/s] model-00001-of-00003.safetensors: 37%|█▍ | 1.82G/4.93G [00:46<01:24, 36.7MB/s] model-00002-of-00003.safetensors: 36%|█▍ | 1.79G/4.98G [00:46<01:13, 43.4MB/s] model-00001-of-00003.safetensors: 37%|█▍ | 1.84G/4.93G [00:47<01:21, 37.9MB/s]

model-00003-of-00003.safetensors: 80%|███▏| 1.76G/2.21G [00:47<00:14, 30.9MB/s]

model-00003-of-00003.safetensors: 80%|███▏| 1.78G/2.21G [00:47<00:12, 35.4MB/s] model-00001-of-00003.safetensors: 38%|█▌ | 1.86G/4.93G [00:47<01:25, 36.1MB/s]

model-00001-of-00003.safetensors: 38%|█▌ | 1.87G/4.93G [00:48<01:14, 41.1MB/s] model-00002-of-00003.safetensors: 37%|█▍ | 1.84G/4.98G [00:48<01:16, 40.9MB/s]

model-00003-of-00003.safetensors: 82%|███▎| 1.81G/2.21G [00:48<00:09, 40.9MB/s] model-00002-of-00003.safetensors: 37%|█▍ | 1.86G/4.98G [00:48<01:15, 41.2MB/s]

model-00003-of-00003.safetensors: 82%|███▎| 1.82G/2.21G [00:48<00:09, 40.1MB/s] model-00002-of-00003.safetensors: 38%|█▌ | 1.87G/4.98G [00:48<01:12, 42.6MB/s]

model-00001-of-00003.safetensors: 38%|█▌ | 1.89G/4.93G [00:49<01:52, 27.1MB/s] model-00002-of-00003.safetensors: 38%|█▌ | 1.89G/4.98G [00:49<01:09, 44.5MB/s]

model-00001-of-00003.safetensors: 39%|█▌ | 1.90G/4.93G [00:49<01:39, 30.4MB/s] model-00002-of-00003.safetensors: 38%|█▌ | 1.90G/4.98G [00:49<01:09, 44.2MB/s]

model-00001-of-00003.safetensors: 39%|█▌ | 1.92G/4.93G [00:49<01:29, 33.6MB/s]

model-00003-of-00003.safetensors: 85%|███▍| 1.89G/2.21G [00:49<00:06, 50.3MB/s] model-00001-of-00003.safetensors: 39%|█▌ | 1.94G/4.93G [00:50<01:21, 36.6MB/s] model-00002-of-00003.safetensors: 39%|█▌ | 1.94G/4.98G [00:50<01:04, 46.8MB/s]

model-00001-of-00003.safetensors: 40%|█▌ | 1.95G/4.93G [00:50<01:18, 38.2MB/s] model-00002-of-00003.safetensors: 39%|█▌ | 1.95G/4.98G [00:50<01:08, 43.9MB/s]

model-00003-of-00003.safetensors: 87%|███▍| 1.92G/2.21G [00:50<00:06, 47.6MB/s] model-00001-of-00003.safetensors: 40%|█▌ | 1.97G/4.93G [00:50<01:17, 38.3MB/s] model-00001-of-00003.safetensors: 40%|█▌ | 1.98G/4.93G [00:51<01:13, 40.0MB/s]

model-00003-of-00003.safetensors: 87%|███▍| 1.94G/2.21G [00:51<00:08, 33.1MB/s] model-00001-of-00003.safetensors: 41%|█▌ | 2.00G/4.93G [00:51<01:08, 43.1MB/s]

model-00003-of-00003.safetensors: 88%|███▌| 1.95G/2.21G [00:51<00:07, 36.7MB/s] model-00001-of-00003.safetensors: 41%|█▋ | 2.02G/4.93G [00:51<01:04, 45.1MB/s]

model-00001-of-00003.safetensors: 41%|█▋ | 2.03G/4.93G [00:52<01:00, 47.6MB/s] model-00002-of-00003.safetensors: 41%|█▋ | 2.03G/4.98G [00:52<01:06, 44.0MB/s]

model-00001-of-00003.safetensors: 42%|█▋ | 2.05G/4.93G [00:52<01:01, 46.7MB/s] model-00002-of-00003.safetensors: 41%|█▋ | 2.05G/4.98G [00:52<01:06, 44.2MB/s]

model-00001-of-00003.safetensors: 42%|█▋ | 2.06G/4.93G [00:52<01:00, 47.8MB/s] model-00002-of-00003.safetensors: 41%|█▋ | 2.06G/4.98G [00:52<01:03, 46.1MB/s]

model-00003-of-00003.safetensors: 91%|███▋| 2.02G/2.21G [00:53<00:04, 47.1MB/s] model-00001-of-00003.safetensors: 42%|█▋ | 2.08G/4.93G [00:53<01:11, 40.2MB/s] model-00001-of-00003.safetensors: 43%|█▋ | 2.11G/4.93G [00:54<01:03, 44.2MB/s] model-00002-of-00003.safetensors: 42%|█▋ | 2.11G/4.98G [00:54<01:06, 42.9MB/s]

model-00001-of-00003.safetensors: 43%|█▋ | 2.13G/4.93G [00:54<01:00, 46.5MB/s] model-00002-of-00003.safetensors: 43%|█▋ | 2.13G/4.98G [00:54<01:04, 44.4MB/s]

model-00001-of-00003.safetensors: 44%|█▊ | 2.18G/4.93G [00:55<00:51, 53.6MB/s]

model-00001-of-00003.safetensors: 44%|█▊ | 2.19G/4.93G [00:55<00:54, 49.9MB/s]

model-00001-of-00003.safetensors: 45%|█▊ | 2.21G/4.93G [00:55<00:53, 51.3MB/s] model-00002-of-00003.safetensors: 43%|█▋ | 2.14G/4.98G [00:55<02:02, 23.2MB/s]

model-00003-of-00003.safetensors: 95%|███▊| 2.10G/2.21G [00:56<00:03, 31.2MB/s] model-00001-of-00003.safetensors: 45%|█▊ | 2.22G/4.93G [00:56<01:02, 43.4MB/s] model-00002-of-00003.safetensors: 44%|█▋ | 2.18G/4.98G [00:56<01:29, 31.2MB/s]

model-00001-of-00003.safetensors: 45%|█▊ | 2.24G/4.93G [00:56<00:59, 45.6MB/s] model-00002-of-00003.safetensors: 44%|█▊ | 2.19G/4.98G [00:56<01:20, 34.4MB/s]

model-00001-of-00003.safetensors: 46%|█▊ | 2.26G/4.93G [00:57<00:56, 47.5MB/s]

model-00003-of-00003.safetensors: 97%|███▉| 2.14G/2.21G [00:57<00:01, 39.7MB/s] model-00001-of-00003.safetensors: 46%|█▊ | 2.27G/4.93G [00:57<00:57, 45.9MB/s]

model-00001-of-00003.safetensors: 46%|█▊ | 2.29G/4.93G [00:57<00:53, 49.0MB/s] model-00002-of-00003.safetensors: 45%|█▊ | 2.22G/4.98G [00:57<01:11, 38.7MB/s]

model-00001-of-00003.safetensors: 47%|█▊ | 2.30G/4.93G [00:58<00:54, 48.3MB/s] model-00002-of-00003.safetensors: 45%|█▊ | 2.24G/4.98G [00:58<01:05, 41.6MB/s]

model-00003-of-00003.safetensors: 99%|███▉| 2.19G/2.21G [00:58<00:00, 45.7MB/s] model-00001-of-00003.safetensors: 47%|█▉ | 2.32G/4.93G [00:58<01:02, 41.9MB/s]

model-00003-of-00003.safetensors: 100%|███▉| 2.21G/2.21G [00:58<00:00, 47.3MB/s] model-00003-of-00003.safetensors: 100%|████| 2.21G/2.21G [00:58<00:00, 37.6MB/s]

model-00001-of-00003.safetensors: 48%|█▉ | 2.35G/4.93G [00:59<00:56, 45.7MB/s] model-00001-of-00003.safetensors: 48%|█▉ | 2.37G/4.93G [00:59<00:55, 46.3MB/s] model-00001-of-00003.safetensors: 49%|█▉ | 2.40G/4.93G [01:00<00:57, 43.8MB/s] model-00001-of-00003.safetensors: 49%|█▉ | 2.42G/4.93G [01:00<00:54, 46.0MB/s] model-00001-of-00003.safetensors: 49%|█▉ | 2.43G/4.93G [01:00<00:54, 46.2MB/s] model-00001-of-00003.safetensors: 50%|█▉ | 2.45G/4.93G [01:01<00:51, 47.9MB/s] model-00001-of-00003.safetensors: 50%|█▉ | 2.46G/4.93G [01:01<00:51, 47.6MB/s] model-00002-of-00003.safetensors: 48%|█▉ | 2.40G/4.98G [01:01<00:59, 43.4MB/s] model-00001-of-00003.safetensors: 50%|██ | 2.48G/4.93G [01:02<01:10, 34.9MB/s] model-00001-of-00003.safetensors: 51%|██ | 2.50G/4.93G [01:02<01:02, 39.1MB/s] model-00002-of-00003.safetensors: 49%|█▉ | 2.45G/4.98G [01:02<00:52, 48.4MB/s] model-00001-of-00003.safetensors: 51%|██ | 2.51G/4.93G [01:03<01:02, 38.9MB/s] model-00001-of-00003.safetensors: 51%|██ | 2.53G/4.93G [01:03<00:57, 41.5MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.54G/4.93G [01:03<00:59, 40.2MB/s] model-00002-of-00003.safetensors: 50%|██ | 2.51G/4.98G [01:04<00:53, 46.2MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.56G/4.93G [01:04<01:09, 33.8MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.57G/4.93G [01:05<01:32, 25.4MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.58G/4.93G [01:05<02:19, 16.9MB/s] model-00001-of-00003.safetensors: 53%|██ | 2.59G/4.93G [01:06<01:32, 25.2MB/s] model-00001-of-00003.safetensors: 53%|██ | 2.61G/4.93G [01:06<01:09, 33.3MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.64G/4.93G [01:07<01:01, 37.4MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.66G/4.93G [01:07<00:55, 40.8MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.67G/4.93G [01:07<00:51, 43.8MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.69G/4.93G [01:08<00:55, 40.2MB/s] model-00001-of-00003.safetensors: 55%|██▏ | 2.70G/4.93G [01:08<00:52, 42.8MB/s] model-00001-of-00003.safetensors: 55%|██▏ | 2.72G/4.93G [01:08<00:49, 45.1MB/s] model-00001-of-00003.safetensors: 55%|██▏ | 2.74G/4.93G [01:09<00:52, 41.9MB/s] model-00001-of-00003.safetensors: 56%|██▏ | 2.75G/4.93G [01:09<00:51, 42.5MB/s] model-00001-of-00003.safetensors: 56%|██▏ | 2.77G/4.93G [01:09<00:48, 44.7MB/s] model-00001-of-00003.safetensors: 56%|██▎ | 2.78G/4.93G [01:10<00:44, 48.0MB/s] model-00002-of-00003.safetensors: 56%|██▏ | 2.77G/4.98G [01:10<00:47, 46.8MB/s] model-00001-of-00003.safetensors: 57%|██▎ | 2.80G/4.93G [01:10<00:52, 40.3MB/s] model-00001-of-00003.safetensors: 57%|██▎ | 2.82G/4.93G [01:11<00:49, 42.8MB/s] model-00001-of-00003.safetensors: 57%|██▎ | 2.83G/4.93G [01:11<00:48, 43.0MB/s] model-00001-of-00003.safetensors: 58%|██▎ | 2.85G/4.93G [01:11<00:46, 44.4MB/s] model-00001-of-00003.safetensors: 58%|██▎ | 2.86G/4.93G [01:12<00:43, 47.1MB/s] model-00001-of-00003.safetensors: 58%|██▎ | 2.88G/4.93G [01:12<00:42, 48.3MB/s] model-00001-of-00003.safetensors: 59%|██▎ | 2.91G/4.93G [01:12<00:41, 48.9MB/s] model-00001-of-00003.safetensors: 59%|██▎ | 2.92G/4.93G [01:13<01:12, 27.9MB/s] model-00001-of-00003.safetensors: 59%|██▎ | 2.93G/4.93G [01:13<01:05, 30.5MB/s] model-00001-of-00003.safetensors: 60%|██▍ | 2.96G/4.93G [01:14<00:48, 40.7MB/s] model-00001-of-00003.safetensors: 60%|██▍ | 2.98G/4.93G [01:14<00:42, 45.9MB/s] model-00002-of-00003.safetensors: 59%|██▍ | 2.96G/4.98G [01:14<00:47, 42.0MB/s] model-00001-of-00003.safetensors: 61%|██▍ | 2.99G/4.93G [01:15<00:50, 38.3MB/s] model-00001-of-00003.safetensors: 61%|██▍ | 3.01G/4.93G [01:15<00:47, 40.9MB/s] model-00002-of-00003.safetensors: 60%|██▍ | 3.01G/4.98G [01:15<00:42, 46.1MB/s] model-00001-of-00003.safetensors: 61%|██▍ | 3.02G/4.93G [01:16<00:53, 35.7MB/s] model-00001-of-00003.safetensors: 62%|██▍ | 3.06G/4.93G [01:16<00:43, 43.3MB/s] model-00002-of-00003.safetensors: 61%|██▍ | 3.06G/4.98G [01:16<00:42, 45.0MB/s] model-00001-of-00003.safetensors: 62%|██▍ | 3.07G/4.93G [01:17<00:43, 42.6MB/s] model-00001-of-00003.safetensors: 63%|██▌ | 3.09G/4.93G [01:17<00:50, 36.2MB/s] model-00002-of-00003.safetensors: 62%|██▍ | 3.10G/4.98G [01:17<00:39, 47.4MB/s] model-00001-of-00003.safetensors: 63%|██▌ | 3.12G/4.93G [01:18<00:44, 40.4MB/s] model-00002-of-00003.safetensors: 63%|██▌ | 3.14G/4.98G [01:18<00:41, 44.6MB/s] model-00001-of-00003.safetensors: 64%|██▌ | 3.14G/4.93G [01:19<00:48, 36.8MB/s] model-00001-of-00003.safetensors: 64%|██▌ | 3.15G/4.93G [01:19<00:44, 40.2MB/s] model-00001-of-00003.safetensors: 64%|██▌ | 3.17G/4.93G [01:19<00:43, 40.6MB/s] model-00001-of-00003.safetensors: 65%|██▌ | 3.18G/4.93G [01:20<00:42, 41.5MB/s] model-00001-of-00003.safetensors: 65%|██▌ | 3.20G/4.93G [01:20<00:47, 36.2MB/s] model-00001-of-00003.safetensors: 65%|██▌ | 3.22G/4.93G [01:21<00:44, 38.8MB/s] model-00002-of-00003.safetensors: 65%|██▌ | 3.25G/4.98G [01:21<00:38, 45.1MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.24G/4.93G [01:21<00:45, 37.3MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.24G/4.93G [01:21<00:47, 35.5MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.25G/4.93G [01:22<01:08, 24.5MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.26G/4.93G [01:22<00:44, 37.4MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.27G/4.93G [01:22<00:54, 30.6MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.28G/4.93G [01:23<00:54, 30.3MB/s] model-00002-of-00003.safetensors: 67%|██▋ | 3.33G/4.98G [01:23<01:00, 27.0MB/s] model-00001-of-00003.safetensors: 67%|██▋ | 3.30G/4.93G [01:23<01:05, 25.0MB/s] model-00002-of-00003.safetensors: 68%|██▋ | 3.36G/4.98G [01:23<00:46, 34.6MB/s] model-00002-of-00003.safetensors: 68%|██▋ | 3.38G/4.98G [01:24<00:39, 40.3MB/s] model-00001-of-00003.safetensors: 67%|██▋ | 3.31G/4.93G [01:24<01:12, 22.4MB/s] model-00001-of-00003.safetensors: 67%|██▋ | 3.33G/4.93G [01:25<00:58, 27.6MB/s] model-00002-of-00003.safetensors: 69%|██▊ | 3.42G/4.98G [01:25<00:34, 44.8MB/s] model-00001-of-00003.safetensors: 68%|██▋ | 3.34G/4.93G [01:25<00:47, 33.2MB/s] model-00001-of-00003.safetensors: 69%|██▊ | 3.39G/4.93G [01:26<00:37, 41.4MB/s] model-00001-of-00003.safetensors: 69%|██▊ | 3.41G/4.93G [01:26<00:37, 40.6MB/s] model-00001-of-00003.safetensors: 69%|██▊ | 3.42G/4.93G [01:27<00:36, 41.5MB/s] model-00001-of-00003.safetensors: 70%|██▊ | 3.44G/4.93G [01:27<00:35, 42.3MB/s] model-00001-of-00003.safetensors: 70%|██▊ | 3.47G/4.93G [01:28<00:31, 46.2MB/s] model-00001-of-00003.safetensors: 71%|██▊ | 3.49G/4.93G [01:28<00:32, 44.9MB/s] model-00001-of-00003.safetensors: 71%|██▊ | 3.50G/4.93G [01:28<00:29, 47.8MB/s] model-00002-of-00003.safetensors: 71%|██▊ | 3.54G/4.98G [01:28<00:38, 37.4MB/s] model-00001-of-00003.safetensors: 72%|██▉ | 3.55G/4.93G [01:29<00:29, 46.5MB/s] model-00001-of-00003.safetensors: 72%|██▉ | 3.57G/4.93G [01:30<00:28, 47.1MB/s] model-00001-of-00003.safetensors: 73%|██▉ | 3.58G/4.93G [01:30<00:30, 43.7MB/s] model-00001-of-00003.safetensors: 73%|██▉ | 3.60G/4.93G [01:31<00:30, 43.5MB/s] model-00001-of-00003.safetensors: 74%|██▉ | 3.63G/4.93G [01:31<00:26, 48.3MB/s] model-00001-of-00003.safetensors: 74%|██▉ | 3.65G/4.93G [01:31<00:25, 49.7MB/s] model-00001-of-00003.safetensors: 74%|██▉ | 3.66G/4.93G [01:32<00:28, 44.5MB/s] model-00001-of-00003.safetensors: 75%|██▉ | 3.68G/4.93G [01:32<00:28, 43.9MB/s] model-00001-of-00003.safetensors: 75%|██▉ | 3.70G/4.93G [01:33<00:27, 44.5MB/s] model-00001-of-00003.safetensors: 75%|███ | 3.71G/4.93G [01:33<00:26, 45.8MB/s] model-00001-of-00003.safetensors: 76%|███ | 3.73G/4.93G [01:33<00:25, 47.0MB/s] model-00002-of-00003.safetensors: 75%|██▉ | 3.73G/4.98G [01:33<00:27, 45.0MB/s] model-00001-of-00003.safetensors: 76%|███ | 3.76G/4.93G [01:34<00:27, 43.2MB/s] model-00002-of-00003.safetensors: 76%|███ | 3.76G/4.98G [01:34<00:28, 43.2MB/s] model-00001-of-00003.safetensors: 77%|███ | 3.78G/4.93G [01:34<00:26, 43.9MB/s] model-00001-of-00003.safetensors: 77%|███ | 3.79G/4.93G [01:35<00:27, 41.3MB/s] model-00001-of-00003.safetensors: 77%|███ | 3.81G/4.93G [01:36<00:33, 34.1MB/s] model-00001-of-00003.safetensors: 78%|███ | 3.84G/4.93G [01:36<00:27, 40.0MB/s] model-00001-of-00003.safetensors: 78%|███ | 3.85G/4.93G [01:37<00:29, 36.5MB/s] model-00001-of-00003.safetensors: 78%|███▏| 3.85G/4.93G [01:37<00:28, 37.8MB/s] model-00002-of-00003.safetensors: 78%|███ | 3.87G/4.98G [01:37<00:27, 40.9MB/s] model-00002-of-00003.safetensors: 78%|███ | 3.89G/4.98G [01:37<00:26, 41.2MB/s] model-00001-of-00003.safetensors: 78%|███▏| 3.86G/4.93G [01:38<01:08, 15.8MB/s] model-00001-of-00003.safetensors: 78%|███▏| 3.87G/4.93G [01:38<00:49, 21.6MB/s] model-00001-of-00003.safetensors: 79%|███▏| 3.89G/4.93G [01:38<00:38, 27.2MB/s] model-00002-of-00003.safetensors: 79%|███▏| 3.95G/4.98G [01:39<00:21, 47.6MB/s] model-00001-of-00003.safetensors: 79%|███▏| 3.90G/4.93G [01:39<00:42, 24.1MB/s] model-00001-of-00003.safetensors: 79%|███▏| 3.92G/4.93G [01:39<00:33, 29.9MB/s] model-00002-of-00003.safetensors: 80%|███▏| 4.00G/4.98G [01:40<00:20, 47.2MB/s] model-00001-of-00003.safetensors: 80%|███▏| 3.94G/4.93G [01:40<00:32, 31.1MB/s] model-00002-of-00003.safetensors: 81%|███▏| 4.03G/4.98G [01:40<00:18, 52.4MB/s] model-00001-of-00003.safetensors: 80%|███▏| 3.95G/4.93G [01:40<00:32, 30.2MB/s] model-00001-of-00003.safetensors: 80%|███▏| 3.97G/4.93G [01:41<00:29, 32.6MB/s] model-00001-of-00003.safetensors: 81%|███▏| 3.98G/4.93G [01:41<00:26, 36.3MB/s] model-00001-of-00003.safetensors: 81%|███▏| 4.00G/4.93G [01:42<00:23, 40.2MB/s] model-00001-of-00003.safetensors: 81%|███▎| 4.02G/4.93G [01:42<00:23, 39.6MB/s] model-00001-of-00003.safetensors: 82%|███▎| 4.03G/4.93G [01:42<00:20, 42.9MB/s] model-00001-of-00003.safetensors: 82%|███▎| 4.06G/4.93G [01:43<00:18, 48.2MB/s] model-00001-of-00003.safetensors: 83%|███▎| 4.08G/4.93G [01:43<00:17, 49.5MB/s] model-00001-of-00003.safetensors: 83%|███▎| 4.10G/4.93G [01:43<00:16, 51.6MB/s] model-00001-of-00003.safetensors: 83%|███▎| 4.11G/4.93G [01:44<00:16, 50.8MB/s] model-00001-of-00003.safetensors: 84%|███▎| 4.13G/4.93G [01:44<00:15, 51.7MB/s] model-00001-of-00003.safetensors: 84%|███▎| 4.14G/4.93G [01:44<00:15, 50.3MB/s] model-00001-of-00003.safetensors: 84%|███▎| 4.16G/4.93G [01:45<00:16, 45.9MB/s] model-00001-of-00003.safetensors: 85%|███▍| 4.19G/4.93G [01:45<00:15, 47.5MB/s] model-00001-of-00003.safetensors: 85%|███▍| 4.21G/4.93G [01:46<00:14, 49.7MB/s] model-00001-of-00003.safetensors: 86%|███▍| 4.22G/4.93G [01:46<00:17, 40.1MB/s] model-00001-of-00003.safetensors: 86%|███▍| 4.24G/4.93G [01:47<00:16, 42.4MB/s] model-00002-of-00003.safetensors: 87%|███▍| 4.32G/4.98G [01:47<00:15, 41.9MB/s] model-00001-of-00003.safetensors: 86%|███▍| 4.26G/4.93G [01:47<00:17, 39.1MB/s] model-00001-of-00003.safetensors: 87%|███▍| 4.27G/4.93G [01:48<00:16, 40.7MB/s] model-00001-of-00003.safetensors: 87%|███▍| 4.29G/4.93G [01:48<00:15, 42.4MB/s] model-00001-of-00003.safetensors: 87%|███▍| 4.30G/4.93G [01:48<00:14, 44.5MB/s] model-00001-of-00003.safetensors: 88%|███▌| 4.32G/4.93G [01:49<00:13, 45.2MB/s] model-00002-of-00003.safetensors: 89%|███▌| 4.42G/4.98G [01:49<00:11, 47.4MB/s] model-00001-of-00003.safetensors: 88%|███▌| 4.34G/4.93G [01:49<00:19, 31.1MB/s] model-00002-of-00003.safetensors: 89%|███▌| 4.45G/4.98G [01:49<00:11, 46.4MB/s] model-00001-of-00003.safetensors: 88%|███▌| 4.35G/4.93G [01:50<00:17, 33.3MB/s] model-00001-of-00003.safetensors: 89%|███▌| 4.37G/4.93G [01:50<00:15, 35.6MB/s] model-00001-of-00003.safetensors: 89%|███▌| 4.40G/4.93G [01:51<00:12, 43.2MB/s] model-00002-of-00003.safetensors: 91%|███▋| 4.51G/4.98G [01:51<00:10, 46.0MB/s] model-00001-of-00003.safetensors: 90%|███▌| 4.42G/4.93G [01:51<00:11, 43.8MB/s] model-00001-of-00003.safetensors: 90%|███▌| 4.45G/4.93G [01:52<00:10, 44.4MB/s] model-00002-of-00003.safetensors: 92%|███▋| 4.56G/4.98G [01:52<00:09, 44.0MB/s] model-00001-of-00003.safetensors: 91%|███▋| 4.48G/4.93G [01:52<00:09, 46.9MB/s] model-00001-of-00003.safetensors: 91%|███▋| 4.50G/4.93G [01:53<00:09, 44.8MB/s] model-00001-of-00003.safetensors: 91%|███▋| 4.51G/4.93G [01:53<00:09, 42.9MB/s] model-00001-of-00003.safetensors: 92%|███▋| 4.53G/4.93G [01:54<00:09, 41.4MB/s] model-00002-of-00003.safetensors: 93%|███▋| 4.64G/4.98G [01:54<00:08, 38.3MB/s] model-00002-of-00003.safetensors: 94%|███▋| 4.66G/4.98G [01:54<00:07, 42.3MB/s] model-00001-of-00003.safetensors: 93%|███▋| 4.58G/4.93G [01:55<00:08, 42.4MB/s] model-00001-of-00003.safetensors: 93%|███▋| 4.59G/4.93G [01:55<00:07, 45.2MB/s] model-00001-of-00003.safetensors: 93%|███▋| 4.61G/4.93G [01:56<00:06, 46.8MB/s] model-00001-of-00003.safetensors: 94%|███▊| 4.64G/4.93G [01:56<00:05, 51.6MB/s] model-00002-of-00003.safetensors: 95%|███▊| 4.74G/4.98G [01:56<00:05, 42.0MB/s] model-00001-of-00003.safetensors: 95%|███▊| 4.67G/4.93G [01:57<00:04, 52.7MB/s] model-00002-of-00003.safetensors: 96%|███▊| 4.77G/4.98G [01:57<00:04, 46.4MB/s] model-00001-of-00003.safetensors: 95%|███▊| 4.69G/4.93G [01:57<00:04, 52.0MB/s] model-00001-of-00003.safetensors: 95%|███▊| 4.70G/4.93G [01:58<00:05, 43.0MB/s] model-00001-of-00003.safetensors: 96%|███▊| 4.74G/4.93G [01:58<00:04, 47.6MB/s] model-00001-of-00003.safetensors: 96%|███▊| 4.75G/4.93G [01:59<00:03, 48.0MB/s] model-00001-of-00003.safetensors: 97%|███▊| 4.77G/4.93G [01:59<00:03, 48.3MB/s] model-00001-of-00003.safetensors: 97%|███▉| 4.78G/4.93G [01:59<00:03, 46.4MB/s] model-00002-of-00003.safetensors: 97%|███▉| 4.85G/4.98G [01:59<00:03, 40.0MB/s] model-00001-of-00003.safetensors: 97%|███▉| 4.80G/4.93G [02:00<00:02, 45.9MB/s] model-00002-of-00003.safetensors: 98%|███▉| 4.88G/4.98G [02:00<00:02, 44.1MB/s] model-00001-of-00003.safetensors: 98%|███▉| 4.82G/4.93G [02:00<00:03, 34.5MB/s] model-00002-of-00003.safetensors: 99%|███▉| 4.91G/4.98G [02:01<00:01, 47.9MB/s] model-00001-of-00003.safetensors: 98%|███▉| 4.85G/4.93G [02:01<00:02, 38.3MB/s] model-00001-of-00003.safetensors: 99%|███▉| 4.86G/4.93G [02:01<00:01, 41.2MB/s] model-00001-of-00003.safetensors: 99%|███▉| 4.88G/4.93G [02:02<00:01, 43.4MB/s] model-00002-of-00003.safetensors: 100%|████| 4.98G/4.98G [02:02<00:00, 40.6MB/s] model-00001-of-00003.safetensors: 100%|████| 4.93G/4.93G [02:03<00:00, 39.9MB/s]

Upload 4 LFS files: 100%|█████████████████████████| 4/4 [02:03<00:00, 31.00s/it] 2023-11-14 05:55:20 - INFO - main - Model saved to data/apt-chat-yi-6B-sft-full [INFO|modelcard.py:452] 2023-11-14 05:55:21,054 >> Dropping the following result as it does not have all the necessary fields: {'dataset': {'name': 'communityai/apt-chat-micro-dataset-llm-v2-714k', 'type': 'communityai/apt-chat-micro-dataset-llm-v2-714k'}} [INFO|configuration_utils.py:461] 2023-11-14 05:55:21,057 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json 2023-11-14 05:55:21 - INFO - main - Pushing to hub...

edbeeching commented 9 months ago

This is probably related to flash attn being disabled and the large prompt limit of 4096. Are you using deepspeed? Do the yi models not support flash-attn?

bugface commented 8 months ago

This is probably related to flash attn being disabled and the large prompt limit of 4096. Are you using deepspeed? Do the yi models not support flash-attn?

Hi @edbeeching, just curious if you can run full SFT on 7b model without deepspeed? I have tried and never been able to run with multi-gpu.yaml. Only deepspeed works with stage2 or stage 3.