Open apt-team-018 opened 10 months ago
Hi, is the repo you are referring to this one or another one? Since your question was not clear about this.
Yes, the repository I'm referring to is 'huggingface/alignment-handbook.' Despite having 6 H100 GPUs for a 7b parameter model, I'm encountering out-of-memory issues. I've set per_device_train_batch_size = 1, but the final batch size somehow ends up being 24, which is likely causing the memory overflow. This issue is preventing me from fine-tuning a 34 billion parameter model on this setup.
Additionally, I attempted to fine-tune a 6 billion parameter model using 8 A100 GPUs, but the training process encountered interruptions. On the first attempt, it stopped at 0.15 epochs, and on the second attempt, where I started from 2 epochs, it oddly skipped some epochs, jumping from 0.15 directly to 1, and then stopped at 2.25. For more detailed information, you can check this WandB link - https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd/
Configs -
model_name_or_path: 01-ai/Yi-6B model_revision: main torch_dtype: bfloat16 use_flash_attention_2: false trust_remote_code: true
dataset_mixer: communityai/apt-chat-micro-dataset-llm-v2-714k: 0.4 dataset_splits:
bf16: true do_eval: true evaluation_strategy: epoch gradient_accumulation_steps: 4 gradient_checkpointing: false hub_model_id: apt-chat-yi-6B-sft-full hub_strategy: every_save learning_rate: 0.00002 log_level: info logging_steps: 50 logging_strategy: steps lr_scheduler_type: cosine max_seq_length: 4096 max_steps: -1 num_train_epochs: 2 output_dir: data/apt-chat-yi-6B-sft-full overwrite_output_dir: true per_device_eval_batch_size: 1 per_device_train_batch_size: 1 push_to_hub: true remove_unused_columns: true report_to:
LOGS -
INFO:root:Using nproc_per_node=8.
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING]
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING]
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2023-11-14 02:09:37,658] torch.distributed.run: [WARNING]
[2023-11-14 02:09:45,328] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,584] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
[2023-11-14 02:09:45,607] [INFO] [comm.py:637:init_distributed] cdb=None
2023-11-14 02:09:45 - WARNING - main - Process rank: 7, device: cuda:7, n_gpu: 1 distributed training: True, 16-bits training: False
[2023-11-14 02:09:45,646] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,793] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,832] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,834] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-11-14 02:09:45,835] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
[2023-11-14 02:09:45,864] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-14 02:09:45,908] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
2023-11-14 02:09:45 - WARNING - main - Process rank: 5, device: cuda:5, n_gpu: 1 distributed training: True, 16-bits training: False
[2023-11-14 02:09:45,939] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-14 02:09:45,939] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
2023-11-14 02:09:45 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False
2023-11-14 02:09:45 - INFO - main - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='01-ai/Yi-6B', model_revision='main', model_code_revision=None, torch_dtype='bfloat16', trust_remote_code=True, use_flash_attention_2=False, use_peft=False, lora_r=16, lora_alpha=32, lora_dropout=0.05, lora_target_modules=None, lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=False, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False)
2023-11-14 02:09:45 - INFO - main - Data parameters DataArguments(chat_template=None, dataset_mixer={'communityai/apt-chat-micro-dataset-llm-v2-714k': 0.4}, dataset_splits=['train', 'test'], max_train_samples=None, max_eval_samples=None, preprocessing_num_workers=12, truncation_side=None)
2023-11-14 02:09:45 - INFO - main - Training/evaluation parameters SFTConfig(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=None,
evaluation_strategy=epoch,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
gradient_accumulation_steps=4,
gradient_checkpointing=False,
gradient_checkpointing_kwargs=None,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=apt-chat-yi-6B-sft-full,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
[2023-11-14 02:09:46,074] [INFO] [comm.py:637:init_distributed] cdb=None
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
2023-11-14 02:09:46 - WARNING - main - Process rank: 2, device: cuda:2, n_gpu: 1 distributed training: True, 16-bits training: False
[2023-11-14 02:09:46,109] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-14 02:09:46,110] [INFO] [comm.py:637:init_distributed] cdb=None
[2023-11-14 02:09:46,118] [INFO] [comm.py:637:init_distributed] cdb=None
2023-11-14 02:09:46 - WARNING - main - Process rank: 6, device: cuda:6, n_gpu: 1 distributed training: True, 16-bits training: False
2023-11-14 02:09:46 - WARNING - main - Process rank: 4, device: cuda:4, n_gpu: 1 distributed training: True, 16-bits training: False
2023-11-14 02:09:46 - WARNING - main - Process rank: 3, device: cuda:3, n_gpu: 1 distributed training: True, 16-bits training: False
/usr/local/lib/python3.10/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache
arguement will be deprecated soon, please use optimize_device_cache
instead.
warnings.warn(
[2023-11-14 02:09:46,193] [INFO] [comm.py:637:init_distributed] cdb=None
2023-11-14 02:09:46 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, 16-bits training: False
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
Overwrite dataset info from restored data version if exists.
2023-11-14 02:09:47 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
2023-11-14 02:09:47 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5)
2023-11-14 02:09:47 - INFO - datasets.builder - Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5)
Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
2023-11-14 02:09:47 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by mode='default'.
table = cls._concat_blocks(blocks, axis=0)
Overwrite dataset info from restored data version if exists.
2023-11-14 02:09:48 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists.
Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
2023-11-14 02:09:48 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5)
2023-11-14 02:09:48 - INFO - datasets.builder - Found cached dataset apt-chat-micro-dataset-llm-v2-714k (/root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5)
Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
2023-11-14 02:09:48 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-af78090beb4300c1.arrow
2023-11-14 02:09:48 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityai__apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-af78090beb4300c1.arrow
Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityaiapt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-2bfe21b70f725afe.arrow
2023-11-14 02:09:48 - INFO - datasets.arrow_dataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/communityai_apt-chat-micro-dataset-llm-v2-714k/default/0.0.0/2fca38419c0e73a5/cache-2bfe21b70f725afe.arrow
2023-11-14 02:09:48 - INFO - main - Training on the following datasets and their proportions: ['train : 285436', 'test : 500']
++++++++++++++++++++++++++++++++++++++
YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': '
<|system|> You are an Information Extraction Specialist AI. When presented with dense or multifaceted content, meticulously identify, extract, and present the key pieces of information embedded within. Your responses should distill the most pertinent details, streamlining the data into a more accessible and concise format. Prioritize accuracy and clarity, ensuring that extracted information maintains its original context and significance.<|endoftext|> <|user|> There are several roles that militant groups fill with child soldiers. In many cases, children participate directly in conflict, but they can also be used for other dangerous support roles. Many are porters who carry heavy loads of ammunition or injured soldiers, while others are lookouts or cooks. Girls are often forced to be sex slaves.
Many children are forced to join military groups at a young age. Child soldiers are also easier to manipulate and force into conflict. Recruiters typically target children from troubled areas or conflict zones, likely accustomed to violence and with fewer educational or work opportunities.
This separation leaves children without any means of safety or security, so they choose to become child soldiers as a form of protection.
A child soldier is not just someone who is involved in fighting. They can also be those in other roles such as cooks, porters, messengers, human shields, spies, suicide bombers or those used for sexual exploitation. It includes children recruited and trained for military purposes, but not used in war.
Child soldiers are children (individuals under the age of 18) who are used for military purposes. According to the United Nations Convention on the Rights of the Child and international human rights law, no child under 18 may be recruited into armed forces (government military) or armed rebel groups (militias and gangs).
why do people use child soldiers?<|endoftext|> <|assistant|> People use child soldiers because they are easier to manipulate and force into conflict[2]. They can be used for many roles in a military group, such as fighting, carrying heavy loads of ammunition, being lookouts, cooks, or sex slaves[1]. Additionally, children have fewer educational or work opportunities and this separation leaves them without any means of safety or security, so they often choose to become child soldiers as a form of protection[3]. Moreover, child soldiers are not just those involved in fighting, but can also include those in other roles such as cooks, porters, messengers, human shields, spies, suicide bombers, and those used for sexual exploitation[4]. According to the United Nations Convention on the Rights of the Child and international human rights law, no child under 18 may be recruited into armed forces or armed rebel groups[5].<|endoftext|>
2023-11-14 02:09:49 - INFO - main - Sample 13112 of the processed training set:
<|system|> You are an AI assistant. Respond with accuracy and coherence.<|endoftext|> <|user|> Rewrite: Task: Research and analyze the socio-economic impact of illicit drug manufacturing on local communities in developing countries.
Rewritten:
Task: Conduct a detailed analysis of the complex interplay between illicit drug manufacturing and socio-economic factors in developing countries. Specifically, explore the economic benefits and drawbacks of drug production and the resulting social impacts on local communities, including addiction rates and crime. Use case studies from specific developing countries to inform your recommendations for effective policies aimed at mitigating the negative effects of drug production on communities. Your report should consist of an introduction, an overview of illicit drug manufacturing in developing countries, a detailed analysis of the economic and social impacts, case studies, policy recommendations, and a conclusion.<|endoftext|> <|assistant|> The task is to thoroughly examine the intricate relationship between illicit drug production and socio-economic factors in developing nations. The aim is to explore the positives and negatives of drug manufacturing on the economy and to investigate the social impacts on local communities such as addiction and criminal activity. Through case studies, a comprehensive analysis will be conducted to inform recommendations on appropriate policies aimed at reducing the negative effects of drug production. The report is expected to comprise an introduction, a comprehensive review of illicit drug manufacturing in developing countries, a thorough analysis of the economic and social impacts, case studies, policy recommendations, and a conclusion.<|endoftext|>
2023-11-14 02:09:49 - INFO - main - Sample 144194 of the processed training set:
<|system|>
You are an AI assistant. Respond with accuracy and coherence.<|endoftext|>
<|user|>
I am looking for a way to calculate how similar each number is compared to every other number in a set. Here's an example:
[61, 62, 59, 60, 63, 62, 58]
would have a high "similarity score", where as [12, 96, 71, 66, 108, 23, 5, 59]
would have a low "similarity score".
Is there a term for this, and how would I calculate it?<|endoftext|> <|assistant|> Yes, the term you may be looking for is "variance" or "standard deviation." Variance and standard deviation are statistical measures that indicate the dispersion or spread of a set of numbers. A smaller variance or standard deviation implies that the numbers are closer together, whereas a larger one implies the numbers are more spread out.
In your case, you can use standard deviation as a "similarity score." Lower standard deviation means higher similarity, and higher standard deviation means lower similarity.
Here's how to calculate the standard deviation:
Using the two arrays you gave. Here's a step-by-step example of computing the standard deviation:
Let's start with the first array [61, 62, 59, 60, 63, 62, 58]
:
Calculate the mean (average) of the numbers in the dataset. (61 + 62 + 59 + 60 + 63 + 62 + 58) / 7 = 425 / 7 = 60.714
Subtract the mean from each number and square the result. (61 - 60.714)^2 = 0.085 (62 - 60.714)^2 = 1.647 (59 - 60.714)^2 = 2.947 (60 - 60.714)^2 = 0.510 (63 - 60.714)^2 = 5.247 (62 - 60.714)^2 = 1.647 (58 - 60.714)^2 = 7.347
Calculate the mean of the squared differences. (0.085 + 1.647 + 2.947 + 0.510 + 5.247 + 1.647 + 7.347) / 7 = 18.48 / 7 = 2.640
Take the square root of the mean of the squared differences. sqrt(2.640) = 1.625
The standard deviation for the first array is approximately 1.625.
Now for the second array [12, 96, 71, 66, 108, 23, 5, 59]
:
Calculate the mean (average) of the numbers in the dataset. (12 + 96 + 71 + 66 + 108 + 23 + 5 + 59) / 8 = 440 / 8 = 55
Subtract the mean from each number and square the result. (12 - 55)^2 = 1849 (96 - 55)^2 = 1681 (71 - 55)^2 = 256 (66 - 55)^2 = 121 (108 - 55)^2 = 2809 (23 - 55)^2 = 1024 (5 - 55)^2 = 2500 (59 - 55)^2 = 16
Calculate the mean of the squared differences. (1849 + 1681 + 256 + 121 + 2809 + 1024 + 2500 + 16) / 8 = 10256 / 8 = 1282
Take the square root of the mean of the squared differences. sqrt(1282) = 35.818
The standard deviation for the second array is approximately 35.818.
So, the standard deviation (similarity score) for the first array is 1.625, and for the second array, it is 35.818. A lower score indicates higher similarity.<|endoftext|> <|user|> Is standart deviation the thing that shows up in graph as the shaded part of projections to show uncertainty?<|endoftext|> <|assistant|> Yes, that is correct. The standard deviation is a measure of how spread out data is. It is calculated by taking the square root of the variance. The variance is a measure of how much the data points vary from the mean.
The standard deviation is often used to show uncertainty in projections. For example, if you have a projection that shows the average temperature will be 50 degrees Fahrenheit, with a standard deviation of 5 degrees Fahrenheit, that means that the temperature could be anywhere from 45 degrees Fahrenheit to 55 degrees Fahrenheit.
The standard deviation is a useful tool for understanding uncertainty. It can help you to make decisions about how to plan for the future.<|endoftext|>
++++++++++++++++++++++++++++++++++++++
YiTokenizer(name_or_path='01-ai/Yi-6B', vocab_size=64000, model_max_length=4096, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '<|startoftext|>', 'eos_token': '<|endoftext|>', 'unk_token': 'AutoModelForCausalLM
or a PeftModel
(if you passed a peft_config
) for you.
warnings.warn(
[INFO|configuration_utils.py:717] 2023-11-14 02:09:53,295 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/config.json
[INFO|configuration_utils.py:717] 2023-11-14 02:09:53,384 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/config.json
[INFO|configuration_utils.py:777] 2023-11-14 02:09:53,386 >> Model config YiConfig {
"_name_or_path": "01-ai/Yi-6B",
"architectures": [
"YiForCausalLM"
],
"auto_map": {
"AutoConfig": "01-ai/Yi-6B--configuration_yi.YiConfig",
"AutoModel": "01-ai/Yi-6B--modeling_yi.YiModel",
"AutoModelForCausalLM": "01-ai/Yi-6B--modeling_yi.YiForCausalLM"
},
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "Yi",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 4,
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"rope_theta": 5000000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.35.0",
"use_cache": true,
"vocab_size": 64000
}
[INFO|modeling_utils.py:3121] 2023-11-14 02:09:53,499 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/model.safetensors.index.json [INFO|modeling_utils.py:1222] 2023-11-14 02:09:53,501 >> Instantiating YiForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:791] 2023-11-14 02:09:53,503 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0 }
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.33s/it]
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.42s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.41s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.42s/it]
[INFO|modeling_utils.py:3950] 2023-11-14 02:09:56,731 >> All model checkpoint weights were used when initializing YiForCausalLM.
[INFO|modeling_utils.py:3958] 2023-11-14 02:09:56,731 >> All the weights of YiForCausalLM were initialized from the model checkpoint at 01-ai/Yi-6B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use YiForCausalLM for predictions without further training.
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.44s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.46s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.46s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:02<00:00, 1.44s/it]
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
[INFO|configuration_utils.py:751] 2023-11-14 02:09:56,841 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--01-ai--Yi-6B/snapshots/5978aa81cd0fb25852004e7a86c71435b3f8de31/generation_config.json
[INFO|configuration_utils.py:791] 2023-11-14 02:09:56,842 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0
}
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/trl/trainer/sft_trainer.py:205: UserWarning: You passed a neftune_noise_alpha
argument to the SFTTrainer, the value you passed will override the one in the TrainingArguments
.
warnings.warn(
[INFO|trainer.py:593] 2023-11-14 02:09:56,987 >> Using auto half precision backend
2023-11-14 02:09:56 - INFO - main - Train
[2023-11-14 02:09:57,109] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.12.2, git-hash=unknown, git-branch=unknown
[2023-11-14 02:09:59,652] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2023-11-14 02:09:59,654] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2023-11-14 02:09:59,654] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2023-11-14 02:09:59,673] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW
[2023-11-14 02:09:59,673] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=<class 'torch.optim.adamw.AdamW'>
[2023-11-14 02:09:59,673] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 ZeRO stage 3 optimizer, MiCS is enabled False, Hierarchical params gather False
[2023-11-14 02:09:59,673] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 3 optimizer
[2023-11-14 02:09:59,829] [INFO] [utils.py:802:see_memory_usage] Stage 3 initialize beginning
[2023-11-14 02:09:59,830] [INFO] [utils.py:803:see_memory_usage] MA 11.35 GB Max_MA 11.42 GB CA 11.49 GB Max_CA 11 GB
[2023-11-14 02:09:59,831] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2%
[2023-11-14 02:09:59,832] [INFO] [stage3.py:126:init] Reduce bucket size 500,000,000
[2023-11-14 02:09:59,833] [INFO] [stage3.py:127:init] Prefetch bucket size 50,000,000
[2023-11-14 02:09:59,984] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2023-11-14 02:09:59,985] [INFO] [utils.py:803:see_memory_usage] MA 11.35 GB Max_MA 11.35 GB CA 11.49 GB Max_CA 11 GB
[2023-11-14 02:09:59,986] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2%
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2023-11-14 02:10:00,226] [INFO] [utils.py:802:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2023-11-14 02:10:00,227] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 11.41 GB CA 11.59 GB Max_CA 12 GB
[2023-11-14 02:10:00,227] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2%
[2023-11-14 02:10:00,352] [INFO] [utils.py:802:see_memory_usage] Before creating fp16 partitions
[2023-11-14 02:10:00,353] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 1.47 GB CA 11.59 GB Max_CA 12 GB
[2023-11-14 02:10:00,353] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.63 GB, percent = 1.2%
[2023-11-14 02:10:01,732] [INFO] [utils.py:802:see_memory_usage] After creating fp16 partitions: 2
[2023-11-14 02:10:01,733] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 1.47 GB CA 1.48 GB Max_CA 12 GB
[2023-11-14 02:10:01,733] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 26.43 GB, percent = 1.3%
[2023-11-14 02:10:01,845] [INFO] [utils.py:802:see_memory_usage] Before creating fp32 partitions
[2023-11-14 02:10:01,845] [INFO] [utils.py:803:see_memory_usage] MA 1.47 GB Max_MA 1.47 GB CA 1.48 GB Max_CA 1 GB
[2023-11-14 02:10:01,846] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 26.43 GB, percent = 1.3%
[2023-11-14 02:10:01,964] [INFO] [utils.py:802:see_memory_usage] After creating fp32 partitions
[2023-11-14 02:10:01,964] [INFO] [utils.py:803:see_memory_usage] MA 4.3 GB Max_MA 5.71 GB CA 5.71 GB Max_CA 6 GB
[2023-11-14 02:10:01,965] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 26.43 GB, percent = 1.3%
[2023-11-14 02:10:02,365] [INFO] [utils.py:802:see_memory_usage] Before initializing optimizer states
[2023-11-14 02:10:02,366] [INFO] [utils.py:803:see_memory_usage] MA 4.3 GB Max_MA 4.3 GB CA 5.71 GB Max_CA 6 GB
[2023-11-14 02:10:02,366] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.7 GB, percent = 1.2%
[2023-11-14 02:10:02,553] [INFO] [utils.py:802:see_memory_usage] After initializing optimizer states
[2023-11-14 02:10:02,554] [INFO] [utils.py:803:see_memory_usage] MA 9.94 GB Max_MA 15.59 GB CA 17.0 GB Max_CA 17 GB
[2023-11-14 02:10:02,555] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.83 GB, percent = 1.2%
[2023-11-14 02:10:02,555] [INFO] [stage3.py:460:_setup_for_real_optimizer] optimizer state initialized
[2023-11-14 02:10:02,794] [INFO] [utils.py:802:see_memory_usage] After initializing ZeRO optimizer
[2023-11-14 02:10:02,795] [INFO] [utils.py:803:see_memory_usage] MA 12.28 GB Max_MA 13.26 GB CA 17.0 GB Max_CA 17 GB
[2023-11-14 02:10:02,795] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 23.86 GB, percent = 1.2%
[2023-11-14 02:10:02,795] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW
[2023-11-14 02:10:02,795] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2023-11-14 02:10:02,796] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2023-11-14 02:10:02,796] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-05, 2e-05], mom=[(0.9, 0.999), (0.9, 0.999)]
[2023-11-14 02:10:02,797] [INFO] [config.py:972:print] DeepSpeedEngine configuration:
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] amp_enabled .................. False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] amp_params ................... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] bfloat16_enabled ............. True
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_parallel_write_pipeline False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_tag_validation_enabled True
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] checkpoint_tag_validation_fail False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f3e8d053e50>
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] communication_data_type ...... None
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] curriculum_enabled_legacy .... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] curriculum_params_legacy ..... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] data_efficiency_enabled ...... False
[2023-11-14 02:10:02,797] [INFO] [config.py:976:print] dataloader_drop_last ......... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] disable_allgather ............ False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] dump_state ................... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] dynamic_loss_scale_args ...... None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_enabled ........... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_gas_boundary_resolution 1
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_layer_name ........ bert.encoder.layer
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_layer_num ......... 0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_max_iter .......... 100
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_stability ......... 1e-06
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_tol ............... 0.01
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] eigenvalue_verbose ........... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] elasticity_enabled ........... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_auto_cast ............... None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_enabled ................. False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] fp16_master_weights_and_gradients False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] global_rank .................. 0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] grad_accum_dtype ............. None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_accumulation_steps .. 4
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_clipping ............ 0.0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] gradient_predivide_factor .... 1.0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] initial_dynamic_scale ........ 1
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] load_universal_checkpoint .... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] loss_scale ................... 1.0
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] memory_breakdown ............. False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] mics_hierarchial_params_gather False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] mics_shard_size .............. -1
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_legacy_fusion ...... False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_name ............... None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] optimizer_params ............. None
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pld_enabled .................. False
[2023-11-14 02:10:02,798] [INFO] [config.py:976:print] pld_params ................... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] prescale_gradients ........... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] scheduler_name ............... None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] scheduler_params ............. None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] seq_parallel_communication_data_type torch.float32
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] sparse_attention ............. None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] sparse_gradients_enabled ..... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] steps_per_print .............. inf
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] train_batch_size ............. 32
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] train_micro_batch_size_per_gpu 1
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] use_node_local_storage ....... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] wall_clock_breakdown ......... False
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] weight_quantization_config ... None
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] world_size ................... 8
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_allow_untested_optimizer True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=True stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_enabled ................. True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_force_ds_cpu_optimizer .. True
[2023-11-14 02:10:02,799] [INFO] [config.py:976:print] zero_optimization_stage ...... 3
[2023-11-14 02:10:02,799] [INFO] [config.py:962:print_user_config] json = {
"train_batch_size": 32,
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": 4,
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "none",
"nvme_path": null
},
"offload_param": {
"device": "none",
"nvme_path": null
},
"stage3_gather_16bit_weights_on_model_save": true
},
"steps_per_print": inf,
"bf16": {
"enabled": true
},
"fp16": {
"enabled": false
},
"zero_allow_untested_optimizer": true
}
[INFO|trainer.py:1723] 2023-11-14 02:10:02,799 >> Running training
[INFO|trainer.py:1724] 2023-11-14 02:10:02,799 >> Num examples = 285,436
[INFO|trainer.py:1725] 2023-11-14 02:10:02,799 >> Num Epochs = 2
[INFO|trainer.py:1726] 2023-11-14 02:10:02,799 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1729] 2023-11-14 02:10:02,799 >> Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:1730] 2023-11-14 02:10:02,799 >> Gradient Accumulation steps = 4
[INFO|trainer.py:1731] 2023-11-14 02:10:02,799 >> Total optimization steps = 17,840
[INFO|trainer.py:1732] 2023-11-14 02:10:02,801 >> Number of trainable parameters = 6,061,035,520
[INFO|integration_utils.py:718] 2023-11-14 02:10:02,802 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: Currently logged in as: developer-team018 (neural-network-018). Use wandb login --relogin
to force relogin
wandb: Tracking run with wandb version 0.16.0
wandb: Run data is saved locally in /workspace/alignment-handbook/wandb/run-20231114_021003-8xmy6gtd
wandb: Run wandb offline
to turn off syncing.
wandb: Syncing run robust-plasma-26
wandb: ⭐️ View project at https://wandb.ai/neural-network-018/huggingface
wandb: 🚀 View run at https://wandb.ai/neural-network-018/huggingface/runs/8xmy6gtd
0%| | 0/17840 [00:00<?, ?it/s][WARNING|tokenization_utils_base.py:3831] 2023-11-14 02:10:23,512 >> Token indices sequence length is longer than the specified maximum sequence length for this model (6114 > 4096). Running this sequence through the model will result in indexing errors
{'loss': 1.7024, 'learning_rate': 1.9999999844947046e-05, 'epoch': 0.0}
{'loss': 1.1507, 'learning_rate': 1.999961237011484e-05, 'epoch': 0.01}
{'loss': 1.0928, 'learning_rate': 1.9998449510510744e-05, 'epoch': 0.01}
{'loss': 1.0793, 'learning_rate': 1.999651151133954e-05, 'epoch': 0.02}
{'loss': 1.0867, 'learning_rate': 1.999379852284651e-05, 'epoch': 0.02}
{'loss': 1.0857, 'learning_rate': 1.999031075535873e-05, 'epoch': 0.03}
{'loss': 1.0721, 'learning_rate': 1.9986048479268788e-05, 'epoch': 0.03}
{'loss': 1.0923, 'learning_rate': 1.99810120250138e-05, 'epoch': 0.04}
{'loss': 1.0836, 'learning_rate': 1.9975201783049804e-05, 'epoch': 0.04}
{'loss': 1.0769, 'learning_rate': 1.9968618203821487e-05, 'epoch': 0.05}
{'loss': 1.0574, 'learning_rate': 1.9961261797727256e-05, 'epoch': 0.06}
{'loss': 1.042, 'learning_rate': 1.9953133135079686e-05, 'epoch': 0.06}
{'loss': 1.0554, 'learning_rate': 1.9944232846061284e-05, 'epoch': 0.07}
{'loss': 1.0735, 'learning_rate': 1.993456162067566e-05, 'epoch': 0.07}
{'loss': 1.0785, 'learning_rate': 1.992412020869401e-05, 'epoch': 0.08}
{'loss': 1.0654, 'learning_rate': 1.9912909419596993e-05, 'epoch': 0.08}
{'loss': 1.0606, 'learning_rate': 1.9900930122511993e-05, 'epoch': 0.09}
{'loss': 1.0664, 'learning_rate': 1.988818324614572e-05, 'epoch': 0.1}
{'loss': 1.0604, 'learning_rate': 1.9874669778712215e-05, 'epoch': 0.1}
{'loss': 1.0674, 'learning_rate': 1.9860390767856244e-05, 'epoch': 0.11}
{'loss': 1.042, 'learning_rate': 1.984534732057208e-05, 'epoch': 0.11}
{'loss': 1.0452, 'learning_rate': 1.9829540603117667e-05, 'epoch': 0.12}
{'loss': 1.0577, 'learning_rate': 1.9812971840924222e-05, 'epoch': 0.12}
{'loss': 1.0471, 'learning_rate': 1.979564231850122e-05, 'epoch': 0.13}
{'loss': 1.0704, 'learning_rate': 1.977755337933682e-05, 'epoch': 0.13}
{'loss': 1.0282, 'learning_rate': 1.9758706425793702e-05, 'epoch': 0.14}
{'loss': 1.0515, 'learning_rate': 1.973910291900036e-05, 'epoch': 0.15}
{'loss': 1.0548, 'learning_rate': 1.97187443787378e-05, 'epoch': 0.15}
8%|██▌ | 1368/17840 [1:50:57<19:16:41, 4.21s/it][INFO|trainer.py:3158] 2023-11-14 04:01:02,181 >> Running Evaluation
[INFO|trainer.py:3160] 2023-11-14 04:01:02,182 >> Num examples = 500
[INFO|trainer.py:3163] 2023-11-14 04:01:02,182 >> Batch size = 1
0%| | 0/63 [00:00<?, ?it/s] 3%|█▍ | 2/63 [00:00<00:12, 5.02it/s] 5%|██ | 3/63 [00:00<00:12, 4.76it/s] 6%|██▊ | 4/63 [00:00<00:15, 3.89it/s] 8%|███▍ | 5/63 [00:01<00:16, 3.50it/s] 10%|████▏ | 6/63 [00:01<00:17, 3.30it/s] 11%|████▉ | 7/63 [00:01<00:17, 3.20it/s] 13%|█████▌ | 8/63 [00:02<00:17, 3.12it/s]
{'eval_loss': 1.0247304439544678, 'eval_runtime': 4.5889, 'eval_samples_per_second': 108.959, 'eval_steps_per_second': 13.729, 'epoch': 0.15}
8%|██▌ | 1368/17840 [1:51:02<19:16:41, 4.21s/it]
14%|██████▎ | 9/63 [00:02<00:17, 3.14it/s]
{'loss': 0.9636, 'learning_rate': 1.9697632383321755e-05, 'epoch': 1.0}
{'loss': 0.9026, 'learning_rate': 1.96757685694803e-05, 'epoch': 1.01}
{'loss': 0.8808, 'learning_rate': 1.965315463222695e-05, 'epoch': 1.01}
{'loss': 0.8712, 'learning_rate': 1.9629792324729302e-05, 'epoch': 1.02}
{'loss': 0.8967, 'learning_rate': 1.960568345817306e-05, 'epoch': 1.03}
{'loss': 0.8676, 'learning_rate': 1.9580829901621666e-05, 'epoch': 1.03}
{'loss': 0.8723, 'learning_rate': 1.9555233581871366e-05, 'epoch': 1.04}
{'loss': 0.9122, 'learning_rate': 1.9528896483301866e-05, 'epoch': 1.04}
{'loss': 0.8687, 'learning_rate': 1.9501820647722458e-05, 'epoch': 1.05}
{'loss': 0.8726, 'learning_rate': 1.947400817421375e-05, 'epoch': 1.05}
{'loss': 0.8505, 'learning_rate': 1.944546121896493e-05, 'epoch': 1.06}
{'loss': 0.8458, 'learning_rate': 1.9416181995106585e-05, 'epoch': 1.07}
{'loss': 0.8721, 'learning_rate': 1.9386172772539162e-05, 'epoch': 1.07}
{'loss': 0.8676, 'learning_rate': 1.9355435877756957e-05, 'epoch': 1.08}
{'loss': 0.8826, 'learning_rate': 1.9323973693667762e-05, 'epoch': 1.08}
{'loss': 0.8607, 'learning_rate': 1.929178865940815e-05, 'epoch': 1.09}
{'loss': 0.8561, 'learning_rate': 1.925888327015434e-05, 'epoch': 1.09}
{'loss': 0.8687, 'learning_rate': 1.9225260076928783e-05, 'epoch': 1.1}
{'loss': 0.874, 'learning_rate': 1.919092168640239e-05, 'epoch': 1.1}
{'loss': 0.8563, 'learning_rate': 1.915587076069243e-05, 'epoch': 1.11}
{'loss': 0.8445, 'learning_rate': 1.9120110017156172e-05, 'epoch': 1.12}
{'loss': 0.8646, 'learning_rate': 1.908364222818019e-05, 'epoch': 1.12}
{'loss': 0.8479, 'learning_rate': 1.9046470220965457e-05, 'epoch': 1.13}
{'loss': 0.8788, 'learning_rate': 1.9008596877308157e-05, 'epoch': 1.13}
{'loss': 0.9, 'learning_rate': 1.8970025133376252e-05, 'epoch': 1.14}
{'loss': 0.8791, 'learning_rate': 1.893075797948188e-05, 'epoch': 1.14}
{'loss': 0.9254, 'learning_rate': 1.889079845984951e-05, 'epoch': 1.15}
15%|█████ | 2736/17840 [3:42:25<17:42:31, 4.22s/it][INFO|trainer.py:3158] 2023-11-14 05:52:30,316 >> Running Evaluation
[INFO|trainer.py:3160] 2023-11-14 05:52:30,317 >> Num examples = 500
[INFO|trainer.py:3163] 2023-11-14 05:52:30,317 >> Batch size = 1
0%| | 0/63 [00:00<?, ?it/s] 3%|█▍ | 2/63 [00:00<00:10, 6.07it/s] 5%|██ | 3/63 [00:00<00:14, 4.20it/s] 6%|██▊ | 4/63 [00:01<00:16, 3.63it/s] 8%|███▍ | 5/63 [00:01<00:17, 3.37it/s] 10%|████▏ | 6/63 [00:01<00:17, 3.23it/s] 11%|████▉ | 7/63 [00:02<00:17, 3.16it/s] 13%|█████▌ | 8/63 [00:02<00:17, 3.06it/s]
{'eval_loss': 1.0676991939544678, 'eval_runtime': 4.5191, 'eval_samples_per_second': 110.641, 'eval_steps_per_second': 13.941, 'epoch': 1.15} 15%|█████ | 2736/17840 [3:42:30<17:42:31, 4.22s/it] 14%|██████▎ | 9/63 [00:02<00:17, 3.09it/s] [INFO|trainer.py:1955] 2023-11-14 05:52:34,837 >>
Training completed. Do not forget to share your model on huggingface.co/models =)
{'train_runtime': 13352.0365, 'train_samples_per_second': 42.755, 'train_steps_per_second': 1.336, 'train_loss': 0.9719247023264567, 'epoch': 1.15} 15%|█████ | 2736/17840 [3:42:30<20:28:20, 4.88s/it] train metrics epoch = 1.15 train_loss = 0.9719 train_runtime = 3:42:32.03 train_samples = 285436 train_samples_per_second = 42.755 train_steps_per_second = 1.336 2023-11-14 05:52:34 - INFO - main - Evaluate [INFO|trainer.py:3158] 2023-11-14 05:52:34,843 >> Running Evaluation [INFO|trainer.py:3160] 2023-11-14 05:52:34,843 >> Num examples = 500 [INFO|trainer.py:3163] 2023-11-14 05:52:34,844 >> Batch size = 1 14%|██████▎ | 9/63 [00:02<00:16, 3.23it/s] eval metrics epoch = 1.15 eval_loss = 1.0677 eval_runtime = 0:00:04.48 eval_samples = 500 eval_samples_per_second = 111.451 eval_steps_per_second = 14.043 2023-11-14 05:52:39 - INFO - main - Save model [INFO|trainer.py:2881] 2023-11-14 05:52:43,590 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full [INFO|configuration_utils.py:461] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json [INFO|configuration_utils.py:564] 2023-11-14 05:52:43,592 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json [INFO|modeling_utils.py:2201] 2023-11-14 05:52:51,334 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2428] 2023-11-14 05:52:51,336 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-11-14 05:52:51,337 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json [INFO|trainer.py:2881] 2023-11-14 05:52:55,599 >> Saving model checkpoint to data/apt-chat-yi-6B-sft-full [INFO|configuration_utils.py:461] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json [INFO|configuration_utils.py:564] 2023-11-14 05:52:55,601 >> Configuration saved in data/apt-chat-yi-6B-sft-full/generation_config.json [INFO|modeling_utils.py:2201] 2023-11-14 05:53:06,302 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at data/apt-chat-yi-6B-sft-full/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2428] 2023-11-14 05:53:06,303 >> tokenizer config file saved in data/apt-chat-yi-6B-sft-full/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-11-14 05:53:06,304 >> Special tokens file saved in data/apt-chat-yi-6B-sft-full/special_tokens_map.json model-00001-of-00003.safetensors: 0%| | 0.00/4.93G [00:00<?, ?B/s] model-00002-of-00003.safetensors: 0%| | 0.00/4.98G [00:00<?, ?B/s]
Upload 4 LFS files: 0%| | 0/4 [00:00<?, ?it/s]
model-00003-of-00003.safetensors: 0%| | 0.00/2.21G [00:00<?, ?B/s]
training_args.bin: 0%| | 0.00/5.62k [00:00<?, ?B/s] model-00002-of-00003.safetensors: 0%| | 8.19k/4.98G [00:00<41:03:55, 33.7kB/s]
model-00001-of-00003.safetensors: 0%| | 8.19k/4.93G [00:00<41:55:35, 32.7kB/s]
training_args.bin: 100%|███████████████████| 5.62k/5.62k [00:00<00:00, 23.8kB/s] training_args.bin: 100%|███████████████████| 5.62k/5.62k [00:00<00:00, 15.4kB/s]
model-00003-of-00003.safetensors: 0%| | 2.30M/2.21G [00:00<05:02, 7.30MB/s] model-00001-of-00003.safetensors: 0%| | 8.19M/4.93G [00:00<04:03, 20.2MB/s]
model-00003-of-00003.safetensors: 0%| | 3.22M/2.21G [00:00<04:58, 7.41MB/s] model-00002-of-00003.safetensors: 0%| | 7.22M/4.98G [00:00<04:34, 18.1MB/s]
model-00003-of-00003.safetensors: 0%| | 6.42M/2.21G [00:00<02:37, 14.0MB/s]
model-00001-of-00003.safetensors: 0%| | 16.0M/4.93G [00:00<04:29, 18.2MB/s] model-00001-of-00003.safetensors: 1%| | 30.2M/4.93G [00:01<02:04, 39.5MB/s]
model-00003-of-00003.safetensors: 1%| | 16.0M/2.21G [00:01<02:05, 17.5MB/s] model-00002-of-00003.safetensors: 0%| | 20.1M/4.98G [00:01<03:45, 21.9MB/s] model-00002-of-00003.safetensors: 0%| | 23.1M/4.98G [00:01<03:39, 22.5MB/s]
model-00003-of-00003.safetensors: 1%| | 20.2M/2.21G [00:01<01:59, 18.3MB/s] model-00002-of-00003.safetensors: 1%| | 27.0M/4.98G [00:01<03:11, 25.8MB/s]
model-00001-of-00003.safetensors: 1%| | 36.7M/4.93G [00:01<03:09, 25.8MB/s]
model-00001-of-00003.safetensors: 1%| | 41.5M/4.93G [00:01<03:01, 26.9MB/s] model-00002-of-00003.safetensors: 1%| | 32.0M/4.98G [00:01<04:09, 19.8MB/s] model-00002-of-00003.safetensors: 1%| | 36.2M/4.98G [00:01<03:42, 22.2MB/s]
model-00003-of-00003.safetensors: 1%| | 32.0M/2.21G [00:01<01:55, 18.8MB/s] model-00002-of-00003.safetensors: 1%| | 38.8M/4.98G [00:01<03:37, 22.7MB/s]
model-00003-of-00003.safetensors: 2%| | 39.9M/2.21G [00:01<01:16, 28.3MB/s] model-00002-of-00003.safetensors: 1%| | 41.8M/4.98G [00:02<03:26, 23.9MB/s]
model-00001-of-00003.safetensors: 1%| | 48.0M/4.93G [00:02<03:56, 20.7MB/s]
model-00003-of-00003.safetensors: 2%| | 50.9M/2.21G [00:02<01:30, 23.8MB/s] model-00002-of-00003.safetensors: 1%| | 48.0M/4.98G [00:02<04:32, 18.1MB/s] model-00001-of-00003.safetensors: 1%| | 64.0M/4.93G [00:02<03:16, 24.7MB/s]
model-00003-of-00003.safetensors: 2%| | 54.5M/2.21G [00:02<01:54, 18.9MB/s] model-00002-of-00003.safetensors: 1%| | 55.0M/4.98G [00:02<03:59, 20.5MB/s]
model-00003-of-00003.safetensors: 3%| | 57.7M/2.21G [00:02<01:44, 20.6MB/s] model-00001-of-00003.safetensors: 2%| | 80.0M/4.93G [00:02<02:32, 31.8MB/s]
model-00003-of-00003.safetensors: 3%| | 64.0M/2.21G [00:03<01:52, 19.1MB/s]
model-00003-of-00003.safetensors: 3%| | 67.3M/2.21G [00:03<01:43, 20.7MB/s] model-00001-of-00003.safetensors: 2%| | 102M/4.93G [00:03<02:10, 37.0MB/s] model-00002-of-00003.safetensors: 1%| | 68.5M/4.98G [00:03<04:06, 19.9MB/s]
model-00003-of-00003.safetensors: 3%|▏ | 69.9M/2.21G [00:03<01:49, 19.5MB/s] model-00002-of-00003.safetensors: 1%| | 71.6M/4.98G [00:03<03:51, 21.2MB/s]
model-00001-of-00003.safetensors: 2%| | 106M/4.93G [00:03<02:16, 35.4MB/s] model-00002-of-00003.safetensors: 2%| | 76.7M/4.98G [00:03<03:04, 26.6MB/s]
model-00003-of-00003.safetensors: 4%|▏ | 80.0M/2.21G [00:03<01:44, 20.5MB/s] model-00002-of-00003.safetensors: 2%| | 80.0M/4.98G [00:04<04:41, 17.4MB/s]
model-00003-of-00003.safetensors: 4%|▏ | 84.4M/2.21G [00:04<01:41, 20.9MB/s] model-00001-of-00003.safetensors: 2%| | 112M/4.93G [00:04<03:45, 21.4MB/s]
model-00001-of-00003.safetensors: 3%|▏ | 127M/4.93G [00:04<02:17, 35.0MB/s]
model-00003-of-00003.safetensors: 4%|▏ | 95.6M/2.21G [00:04<01:03, 33.4MB/s] model-00002-of-00003.safetensors: 2%| | 87.7M/4.98G [00:04<04:08, 19.6MB/s] model-00001-of-00003.safetensors: 3%|▏ | 134M/4.93G [00:04<02:26, 32.7MB/s]
model-00001-of-00003.safetensors: 3%|▏ | 144M/4.93G [00:04<02:20, 34.1MB/s] model-00002-of-00003.safetensors: 2%| | 96.6M/4.98G [00:04<04:55, 16.5MB/s]
model-00001-of-00003.safetensors: 3%|▏ | 151M/4.93G [00:05<02:15, 35.4MB/s] model-00002-of-00003.safetensors: 2%| | 101M/4.98G [00:05<04:11, 19.4MB/s]
model-00003-of-00003.safetensors: 5%|▎ | 117M/2.21G [00:05<01:07, 31.1MB/s] model-00001-of-00003.safetensors: 3%|▏ | 155M/4.93G [00:05<02:18, 34.5MB/s] model-00001-of-00003.safetensors: 3%|▏ | 160M/4.93G [00:05<02:14, 35.5MB/s]
model-00003-of-00003.safetensors: 5%|▎ | 121M/2.21G [00:05<01:17, 26.8MB/s]
model-00003-of-00003.safetensors: 6%|▎ | 127M/2.21G [00:05<01:04, 32.4MB/s] model-00001-of-00003.safetensors: 3%|▏ | 164M/4.93G [00:05<03:35, 22.1MB/s] model-00002-of-00003.safetensors: 3%|▏ | 127M/4.98G [00:05<02:10, 37.3MB/s]
model-00001-of-00003.safetensors: 4%|▏ | 176M/4.93G [00:06<03:11, 24.8MB/s] model-00002-of-00003.safetensors: 3%|▏ | 133M/4.98G [00:06<02:57, 27.3MB/s]
model-00001-of-00003.safetensors: 4%|▏ | 181M/4.93G [00:06<02:54, 27.2MB/s] model-00001-of-00003.safetensors: 4%|▏ | 185M/4.93G [00:06<03:00, 26.2MB/s] model-00002-of-00003.safetensors: 3%|▏ | 143M/4.98G [00:06<02:43, 29.5MB/s]
model-00001-of-00003.safetensors: 4%|▏ | 191M/4.93G [00:06<02:27, 32.2MB/s]
model-00003-of-00003.safetensors: 7%|▎ | 166M/2.21G [00:06<01:00, 34.1MB/s]
model-00003-of-00003.safetensors: 8%|▍ | 170M/2.21G [00:06<01:00, 33.6MB/s]
model-00003-of-00003.safetensors: 8%|▍ | 176M/2.21G [00:06<00:53, 38.2MB/s] model-00001-of-00003.safetensors: 4%|▏ | 195M/4.93G [00:07<04:03, 19.5MB/s]
model-00003-of-00003.safetensors: 8%|▍ | 180M/2.21G [00:07<01:21, 25.0MB/s] model-00002-of-00003.safetensors: 3%|▏ | 160M/4.98G [00:07<03:19, 24.1MB/s]
model-00001-of-00003.safetensors: 4%|▏ | 208M/4.93G [00:07<03:34, 22.1MB/s]
model-00003-of-00003.safetensors: 8%|▍ | 187M/2.21G [00:07<01:19, 25.4MB/s] model-00001-of-00003.safetensors: 5%|▏ | 231M/4.93G [00:08<02:25, 32.4MB/s]
model-00001-of-00003.safetensors: 5%|▏ | 234M/4.93G [00:08<02:25, 32.2MB/s]
model-00001-of-00003.safetensors: 5%|▏ | 239M/4.93G [00:08<02:16, 34.4MB/s] model-00002-of-00003.safetensors: 4%|▏ | 192M/4.98G [00:08<02:40, 29.7MB/s]
model-00003-of-00003.safetensors: 9%|▍ | 199M/2.21G [00:08<01:33, 21.6MB/s]
model-00003-of-00003.safetensors: 9%|▍ | 202M/2.21G [00:08<01:27, 23.0MB/s] model-00001-of-00003.safetensors: 5%|▏ | 243M/4.93G [00:08<03:43, 21.0MB/s]
model-00001-of-00003.safetensors: 5%|▎ | 256M/4.93G [00:09<03:01, 25.8MB/s]
model-00001-of-00003.safetensors: 6%|▎ | 272M/4.93G [00:09<02:11, 35.4MB/s] model-00002-of-00003.safetensors: 5%|▏ | 224M/4.98G [00:09<02:58, 26.6MB/s]
model-00001-of-00003.safetensors: 6%|▎ | 288M/4.93G [00:09<02:03, 37.5MB/s]
model-00003-of-00003.safetensors: 12%|▌ | 256M/2.21G [00:09<00:48, 40.5MB/s] model-00001-of-00003.safetensors: 6%|▎ | 311M/4.93G [00:10<01:41, 45.6MB/s]
model-00003-of-00003.safetensors: 12%|▌ | 272M/2.21G [00:10<00:43, 44.1MB/s]
model-00001-of-00003.safetensors: 6%|▎ | 317M/4.93G [00:10<01:53, 40.6MB/s]
model-00003-of-00003.safetensors: 13%|▋ | 288M/2.21G [00:10<00:39, 48.7MB/s] model-00001-of-00003.safetensors: 7%|▎ | 321M/4.93G [00:10<02:58, 25.9MB/s]
model-00003-of-00003.safetensors: 13%|▋ | 293M/2.21G [00:10<01:02, 30.8MB/s] model-00001-of-00003.safetensors: 7%|▎ | 336M/4.93G [00:11<02:24, 31.8MB/s]
model-00003-of-00003.safetensors: 14%|▋ | 304M/2.21G [00:11<00:59, 32.1MB/s] model-00001-of-00003.safetensors: 7%|▎ | 352M/4.93G [00:11<02:02, 37.3MB/s] model-00002-of-00003.safetensors: 6%|▎ | 293M/4.98G [00:11<02:16, 34.3MB/s]
model-00003-of-00003.safetensors: 14%|▋ | 320M/2.21G [00:11<00:51, 36.6MB/s] model-00002-of-00003.safetensors: 6%|▎ | 297M/4.98G [00:11<02:19, 33.4MB/s] model-00001-of-00003.safetensors: 7%|▎ | 368M/4.93G [00:11<01:46, 42.8MB/s]
model-00001-of-00003.safetensors: 8%|▍ | 384M/4.93G [00:12<01:37, 46.7MB/s]
model-00001-of-00003.safetensors: 8%|▍ | 400M/4.93G [00:12<01:30, 50.0MB/s] model-00001-of-00003.safetensors: 8%|▍ | 416M/4.93G [00:12<01:28, 50.9MB/s]
model-00003-of-00003.safetensors: 17%|▊ | 368M/2.21G [00:12<00:42, 43.4MB/s]
model-00003-of-00003.safetensors: 17%|▊ | 384M/2.21G [00:12<00:38, 47.8MB/s] model-00001-of-00003.safetensors: 9%|▍ | 432M/4.93G [00:13<01:33, 48.1MB/s]
model-00003-of-00003.safetensors: 18%|▉ | 400M/2.21G [00:13<00:37, 48.5MB/s] model-00001-of-00003.safetensors: 9%|▍ | 448M/4.93G [00:13<01:32, 48.6MB/s]
model-00003-of-00003.safetensors: 19%|▉ | 412M/2.21G [00:13<00:32, 56.2MB/s] model-00002-of-00003.safetensors: 7%|▎ | 341M/4.98G [00:13<02:22, 32.4MB/s] model-00001-of-00003.safetensors: 9%|▍ | 464M/4.93G [00:13<01:31, 48.7MB/s]
model-00001-of-00003.safetensors: 10%|▍ | 480M/4.93G [00:13<01:28, 50.6MB/s] model-00002-of-00003.safetensors: 7%|▎ | 352M/4.98G [00:14<03:38, 21.1MB/s]
model-00001-of-00003.safetensors: 10%|▌ | 496M/4.93G [00:14<01:31, 48.6MB/s] model-00002-of-00003.safetensors: 7%|▎ | 368M/4.98G [00:14<02:42, 28.4MB/s]
model-00003-of-00003.safetensors: 20%|█ | 448M/2.21G [00:14<00:44, 39.8MB/s] model-00002-of-00003.safetensors: 8%|▍ | 384M/4.98G [00:14<02:03, 37.1MB/s]
model-00003-of-00003.safetensors: 21%|█ | 464M/2.21G [00:14<00:41, 42.5MB/s] model-00002-of-00003.safetensors: 8%|▍ | 400M/4.98G [00:15<01:54, 39.9MB/s]
model-00001-of-00003.safetensors: 10%|▌ | 512M/4.93G [00:15<02:30, 29.3MB/s] model-00002-of-00003.safetensors: 8%|▍ | 416M/4.98G [00:15<01:47, 42.3MB/s]
model-00001-of-00003.safetensors: 11%|▌ | 528M/4.93G [00:15<02:15, 32.5MB/s] model-00001-of-00003.safetensors: 11%|▌ | 544M/4.93G [00:16<02:03, 35.5MB/s] model-00002-of-00003.safetensors: 9%|▍ | 448M/4.98G [00:16<01:46, 42.4MB/s] model-00002-of-00003.safetensors: 9%|▍ | 464M/4.98G [00:16<01:45, 42.8MB/s]
model-00001-of-00003.safetensors: 11%|▌ | 560M/4.93G [00:16<02:10, 33.4MB/s]
model-00001-of-00003.safetensors: 12%|▌ | 576M/4.93G [00:16<01:59, 36.3MB/s] model-00002-of-00003.safetensors: 10%|▍ | 480M/4.98G [00:17<01:56, 38.6MB/s]
model-00001-of-00003.safetensors: 12%|▌ | 592M/4.93G [00:17<01:51, 39.1MB/s] model-00002-of-00003.safetensors: 10%|▍ | 496M/4.98G [00:17<01:54, 39.2MB/s]
model-00001-of-00003.safetensors: 12%|▌ | 608M/4.93G [00:17<01:46, 40.7MB/s] model-00002-of-00003.safetensors: 10%|▌ | 512M/4.98G [00:17<01:46, 42.1MB/s] model-00002-of-00003.safetensors: 10%|▌ | 519M/4.98G [00:17<01:47, 41.6MB/s]
model-00001-of-00003.safetensors: 13%|▋ | 624M/4.93G [00:17<01:38, 43.9MB/s] model-00002-of-00003.safetensors: 11%|▌ | 523M/4.98G [00:18<01:49, 40.6MB/s] model-00001-of-00003.safetensors: 13%|▋ | 640M/4.93G [00:18<01:34, 45.4MB/s]
model-00003-of-00003.safetensors: 27%|█▎ | 592M/2.21G [00:18<00:40, 40.3MB/s] model-00001-of-00003.safetensors: 13%|▋ | 656M/4.93G [00:18<01:29, 47.5MB/s]
model-00003-of-00003.safetensors: 27%|█▎ | 608M/2.21G [00:18<00:36, 43.4MB/s] model-00001-of-00003.safetensors: 14%|▋ | 672M/4.93G [00:18<01:28, 48.1MB/s]
model-00003-of-00003.safetensors: 28%|█▍ | 624M/2.21G [00:18<00:35, 44.9MB/s] model-00002-of-00003.safetensors: 11%|▌ | 560M/4.98G [00:19<02:05, 35.1MB/s]
model-00001-of-00003.safetensors: 14%|▋ | 688M/4.93G [00:19<01:32, 46.1MB/s]
model-00003-of-00003.safetensors: 30%|█▍ | 656M/2.21G [00:19<00:33, 46.6MB/s] model-00001-of-00003.safetensors: 14%|▋ | 704M/4.93G [00:19<01:30, 46.9MB/s] model-00001-of-00003.safetensors: 15%|▋ | 720M/4.93G [00:19<01:27, 47.9MB/s]
model-00003-of-00003.safetensors: 30%|█▌ | 672M/2.21G [00:20<00:37, 41.0MB/s] model-00001-of-00003.safetensors: 15%|▋ | 736M/4.93G [00:20<01:30, 46.5MB/s]
model-00003-of-00003.safetensors: 31%|█▌ | 688M/2.21G [00:20<00:33, 44.9MB/s] model-00002-of-00003.safetensors: 13%|▋ | 624M/4.98G [00:20<01:28, 49.0MB/s]
model-00003-of-00003.safetensors: 32%|█▌ | 704M/2.21G [00:20<00:31, 47.4MB/s] model-00001-of-00003.safetensors: 15%|▊ | 752M/4.93G [00:20<01:51, 37.6MB/s]
model-00003-of-00003.safetensors: 33%|█▋ | 720M/2.21G [00:21<00:31, 48.0MB/s] model-00001-of-00003.safetensors: 16%|▊ | 768M/4.93G [00:21<01:46, 39.0MB/s]
model-00003-of-00003.safetensors: 33%|█▋ | 736M/2.21G [00:21<00:31, 46.4MB/s] model-00001-of-00003.safetensors: 16%|▊ | 784M/4.93G [00:21<01:40, 41.4MB/s]
model-00001-of-00003.safetensors: 16%|▊ | 800M/4.93G [00:22<01:42, 40.5MB/s] model-00002-of-00003.safetensors: 14%|▋ | 688M/4.98G [00:22<01:45, 40.6MB/s]
model-00003-of-00003.safetensors: 35%|█▋ | 768M/2.21G [00:22<00:32, 44.6MB/s] model-00002-of-00003.safetensors: 14%|▋ | 704M/4.98G [00:22<01:34, 45.1MB/s]
model-00001-of-00003.safetensors: 17%|▊ | 816M/4.93G [00:22<01:40, 40.8MB/s] model-00002-of-00003.safetensors: 14%|▋ | 720M/4.98G [00:22<01:28, 47.9MB/s]
model-00001-of-00003.safetensors: 17%|▊ | 832M/4.93G [00:22<01:40, 40.8MB/s] model-00002-of-00003.safetensors: 15%|▋ | 736M/4.98G [00:23<01:39, 42.6MB/s]
model-00001-of-00003.safetensors: 17%|▊ | 848M/4.93G [00:23<01:34, 43.1MB/s] model-00001-of-00003.safetensors: 18%|▉ | 864M/4.93G [00:23<01:28, 45.8MB/s]
model-00003-of-00003.safetensors: 38%|█▉ | 832M/2.21G [00:23<00:30, 46.0MB/s]
model-00003-of-00003.safetensors: 38%|█▉ | 848M/2.21G [00:23<00:28, 47.4MB/s] model-00001-of-00003.safetensors: 18%|▉ | 880M/4.93G [00:23<01:38, 41.1MB/s]
model-00003-of-00003.safetensors: 39%|█▉ | 864M/2.21G [00:24<00:26, 50.4MB/s] model-00002-of-00003.safetensors: 16%|▊ | 784M/4.98G [00:24<01:44, 40.1MB/s]
model-00003-of-00003.safetensors: 40%|█▉ | 880M/2.21G [00:24<00:25, 51.4MB/s] model-00002-of-00003.safetensors: 16%|▊ | 800M/4.98G [00:24<01:40, 41.7MB/s]
model-00001-of-00003.safetensors: 18%|▉ | 896M/4.93G [00:24<02:22, 28.2MB/s] model-00002-of-00003.safetensors: 16%|▊ | 816M/4.98G [00:24<01:34, 44.0MB/s]
model-00001-of-00003.safetensors: 18%|▉ | 912M/4.93G [00:25<02:03, 32.7MB/s] model-00002-of-00003.safetensors: 17%|▊ | 832M/4.98G [00:25<01:30, 46.0MB/s]
model-00001-of-00003.safetensors: 19%|▉ | 928M/4.93G [00:25<01:55, 34.7MB/s] model-00002-of-00003.safetensors: 17%|▊ | 848M/4.98G [00:25<01:30, 45.6MB/s]
model-00003-of-00003.safetensors: 43%|██▏ | 944M/2.21G [00:25<00:28, 43.9MB/s] model-00001-of-00003.safetensors: 19%|▉ | 944M/4.93G [00:26<01:48, 36.7MB/s]
model-00003-of-00003.safetensors: 43%|██▏ | 960M/2.21G [00:26<00:26, 46.4MB/s] model-00001-of-00003.safetensors: 19%|▉ | 960M/4.93G [00:26<01:40, 39.5MB/s]
model-00003-of-00003.safetensors: 44%|██▏ | 976M/2.21G [00:26<00:24, 49.5MB/s] model-00001-of-00003.safetensors: 20%|▉ | 976M/4.93G [00:26<01:36, 41.1MB/s]
model-00003-of-00003.safetensors: 45%|██▏ | 992M/2.21G [00:26<00:25, 47.4MB/s] model-00001-of-00003.safetensors: 20%|▊ | 1.01G/4.93G [00:27<01:24, 46.5MB/s] model-00001-of-00003.safetensors: 21%|▊ | 1.02G/4.93G [00:27<01:21, 47.8MB/s] model-00001-of-00003.safetensors: 21%|▊ | 1.04G/4.93G [00:28<01:27, 44.7MB/s] model-00002-of-00003.safetensors: 19%|▉ | 960M/4.98G [00:28<01:26, 46.2MB/s] model-00001-of-00003.safetensors: 22%|▊ | 1.06G/4.93G [00:28<01:31, 42.2MB/s] model-00001-of-00003.safetensors: 22%|▊ | 1.07G/4.93G [00:28<01:44, 36.9MB/s] model-00002-of-00003.safetensors: 20%|▊ | 1.01G/4.98G [00:28<01:20, 49.5MB/s]
model-00003-of-00003.safetensors: 46%|█▊ | 1.01G/2.21G [00:29<01:10, 17.0MB/s] model-00001-of-00003.safetensors: 22%|▊ | 1.07G/4.93G [00:29<02:47, 23.0MB/s]
model-00001-of-00003.safetensors: 22%|▉ | 1.09G/4.93G [00:29<02:02, 31.4MB/s] model-00001-of-00003.safetensors: 22%|▉ | 1.10G/4.93G [00:29<01:46, 35.9MB/s] model-00002-of-00003.safetensors: 21%|▊ | 1.06G/4.98G [00:30<01:24, 46.5MB/s]
model-00001-of-00003.safetensors: 23%|▉ | 1.12G/4.93G [00:30<01:33, 40.9MB/s] model-00001-of-00003.safetensors: 23%|▉ | 1.14G/4.93G [00:30<01:25, 44.4MB/s] model-00002-of-00003.safetensors: 22%|▊ | 1.09G/4.98G [00:30<01:19, 48.9MB/s]
model-00001-of-00003.safetensors: 23%|▉ | 1.15G/4.93G [00:30<01:23, 45.3MB/s] model-00002-of-00003.safetensors: 22%|▉ | 1.10G/4.98G [00:31<01:22, 46.9MB/s]
model-00001-of-00003.safetensors: 24%|▉ | 1.17G/4.93G [00:31<01:22, 45.6MB/s] model-00002-of-00003.safetensors: 23%|▉ | 1.12G/4.98G [00:31<01:18, 49.4MB/s] model-00002-of-00003.safetensors: 23%|▉ | 1.14G/4.98G [00:31<01:11, 53.8MB/s]
model-00001-of-00003.safetensors: 24%|▉ | 1.18G/4.93G [00:31<01:25, 43.7MB/s] model-00002-of-00003.safetensors: 23%|▉ | 1.15G/4.98G [00:31<01:13, 52.0MB/s]
model-00001-of-00003.safetensors: 24%|▉ | 1.20G/4.93G [00:32<01:27, 42.9MB/s]
model-00003-of-00003.safetensors: 51%|██ | 1.12G/2.21G [00:32<00:29, 36.5MB/s] model-00001-of-00003.safetensors: 25%|▉ | 1.22G/4.93G [00:32<01:30, 41.2MB/s] model-00002-of-00003.safetensors: 24%|▉ | 1.18G/4.98G [00:32<01:20, 47.3MB/s]
model-00001-of-00003.safetensors: 25%|▉ | 1.23G/4.93G [00:32<01:24, 43.7MB/s]
model-00001-of-00003.safetensors: 25%|█ | 1.25G/4.93G [00:33<01:23, 44.1MB/s] model-00002-of-00003.safetensors: 24%|▉ | 1.20G/4.98G [00:33<01:46, 35.5MB/s]
model-00001-of-00003.safetensors: 26%|█ | 1.26G/4.93G [00:33<01:20, 45.3MB/s] model-00002-of-00003.safetensors: 24%|▉ | 1.22G/4.98G [00:33<01:38, 38.1MB/s]
model-00001-of-00003.safetensors: 26%|█ | 1.28G/4.93G [00:33<01:25, 42.8MB/s]
model-00001-of-00003.safetensors: 26%|█ | 1.30G/4.93G [00:34<01:25, 42.4MB/s] model-00002-of-00003.safetensors: 25%|▉ | 1.23G/4.98G [00:34<02:02, 30.6MB/s]
model-00001-of-00003.safetensors: 27%|█ | 1.31G/4.93G [00:34<01:23, 43.6MB/s] model-00002-of-00003.safetensors: 25%|█ | 1.25G/4.98G [00:34<01:48, 34.4MB/s]
model-00001-of-00003.safetensors: 27%|█ | 1.33G/4.93G [00:34<01:16, 47.0MB/s]
model-00003-of-00003.safetensors: 56%|██▎ | 1.25G/2.21G [00:35<00:19, 49.2MB/s] model-00002-of-00003.safetensors: 25%|█ | 1.26G/4.98G [00:35<01:40, 36.9MB/s]
model-00003-of-00003.safetensors: 57%|██▎ | 1.26G/2.21G [00:35<00:19, 49.7MB/s] model-00001-of-00003.safetensors: 27%|█ | 1.34G/4.93G [00:35<01:30, 39.7MB/s]
model-00001-of-00003.safetensors: 28%|█ | 1.36G/4.93G [00:35<01:20, 44.3MB/s] model-00001-of-00003.safetensors: 28%|█ | 1.38G/4.93G [00:36<01:16, 46.4MB/s]
model-00003-of-00003.safetensors: 59%|██▎ | 1.30G/2.21G [00:36<00:18, 48.6MB/s] model-00002-of-00003.safetensors: 26%|█ | 1.31G/4.98G [00:36<01:29, 41.1MB/s]
model-00003-of-00003.safetensors: 59%|██▎ | 1.31G/2.21G [00:36<00:19, 46.6MB/s] model-00001-of-00003.safetensors: 28%|█▏ | 1.39G/4.93G [00:36<01:25, 41.2MB/s]
model-00001-of-00003.safetensors: 29%|█▏ | 1.41G/4.93G [00:36<01:21, 43.4MB/s] model-00002-of-00003.safetensors: 27%|█ | 1.34G/4.98G [00:36<01:24, 42.9MB/s]
model-00001-of-00003.safetensors: 29%|█▏ | 1.42G/4.93G [00:37<01:16, 45.7MB/s] model-00002-of-00003.safetensors: 27%|█ | 1.36G/4.98G [00:37<01:21, 44.5MB/s]
model-00001-of-00003.safetensors: 29%|█▏ | 1.44G/4.93G [00:37<01:14, 47.0MB/s] model-00002-of-00003.safetensors: 28%|█ | 1.38G/4.98G [00:37<01:22, 43.8MB/s]
model-00001-of-00003.safetensors: 30%|█▏ | 1.46G/4.93G [00:37<01:11, 48.5MB/s] model-00002-of-00003.safetensors: 28%|█ | 1.39G/4.98G [00:37<01:19, 45.0MB/s]
model-00003-of-00003.safetensors: 63%|██▌ | 1.39G/2.21G [00:38<00:18, 44.3MB/s] model-00002-of-00003.safetensors: 28%|█▏ | 1.41G/4.98G [00:38<01:15, 47.6MB/s]
model-00001-of-00003.safetensors: 30%|█▏ | 1.47G/4.93G [00:38<01:39, 34.6MB/s] model-00001-of-00003.safetensors: 30%|█▏ | 1.49G/4.93G [00:38<01:28, 38.8MB/s]
model-00003-of-00003.safetensors: 64%|██▌ | 1.42G/2.21G [00:38<00:16, 46.5MB/s] model-00001-of-00003.safetensors: 30%|█▏ | 1.50G/4.93G [00:39<01:25, 40.0MB/s]
model-00003-of-00003.safetensors: 65%|██▌ | 1.44G/2.21G [00:39<00:17, 45.0MB/s] model-00002-of-00003.safetensors: 29%|█▏ | 1.46G/4.98G [00:39<01:17, 45.5MB/s]
model-00001-of-00003.safetensors: 31%|█▏ | 1.52G/4.93G [00:39<01:23, 40.8MB/s] model-00002-of-00003.safetensors: 30%|█▏ | 1.47G/4.98G [00:39<01:17, 45.0MB/s]
model-00003-of-00003.safetensors: 67%|██▋ | 1.47G/2.21G [00:39<00:15, 47.2MB/s] model-00001-of-00003.safetensors: 31%|█▏ | 1.54G/4.93G [00:40<01:31, 37.2MB/s]
model-00001-of-00003.safetensors: 31%|█▎ | 1.55G/4.93G [00:40<01:20, 41.8MB/s] model-00002-of-00003.safetensors: 30%|█▏ | 1.50G/4.98G [00:40<01:26, 40.1MB/s]
model-00001-of-00003.safetensors: 32%|█▎ | 1.57G/4.93G [00:40<01:20, 41.9MB/s] model-00002-of-00003.safetensors: 31%|█▏ | 1.52G/4.98G [00:41<01:31, 37.6MB/s]
model-00001-of-00003.safetensors: 32%|█▎ | 1.58G/4.93G [00:41<01:30, 37.2MB/s] model-00002-of-00003.safetensors: 31%|█▏ | 1.54G/4.98G [00:41<01:22, 41.8MB/s]
model-00001-of-00003.safetensors: 32%|█▎ | 1.60G/4.93G [00:41<01:21, 40.8MB/s] model-00002-of-00003.safetensors: 31%|█▏ | 1.55G/4.98G [00:41<01:16, 44.6MB/s]
model-00003-of-00003.safetensors: 70%|██▊ | 1.55G/2.21G [00:41<00:15, 42.9MB/s] model-00001-of-00003.safetensors: 33%|█▎ | 1.62G/4.93G [00:41<01:17, 42.7MB/s]
model-00003-of-00003.safetensors: 71%|██▊ | 1.57G/2.21G [00:42<00:15, 42.7MB/s] model-00001-of-00003.safetensors: 33%|█▎ | 1.63G/4.93G [00:42<01:17, 42.8MB/s]
model-00001-of-00003.safetensors: 33%|█▎ | 1.65G/4.93G [00:42<01:14, 44.2MB/s] model-00002-of-00003.safetensors: 32%|█▎ | 1.60G/4.98G [00:42<01:14, 45.4MB/s]
model-00003-of-00003.safetensors: 72%|██▉ | 1.60G/2.21G [00:42<00:13, 45.8MB/s] model-00001-of-00003.safetensors: 34%|█▎ | 1.66G/4.93G [00:42<01:12, 45.0MB/s] model-00002-of-00003.safetensors: 33%|█▎ | 1.63G/4.98G [00:43<01:09, 48.0MB/s]
model-00001-of-00003.safetensors: 34%|█▎ | 1.68G/4.93G [00:43<01:11, 45.6MB/s] model-00002-of-00003.safetensors: 33%|█▎ | 1.65G/4.98G [00:43<01:09, 47.6MB/s]
model-00001-of-00003.safetensors: 34%|█▍ | 1.70G/4.93G [00:43<01:17, 41.7MB/s] model-00002-of-00003.safetensors: 33%|█▎ | 1.66G/4.98G [00:43<01:07, 49.2MB/s]
model-00001-of-00003.safetensors: 35%|█▍ | 1.71G/4.93G [00:44<01:16, 42.0MB/s] model-00002-of-00003.safetensors: 34%|█▎ | 1.68G/4.98G [00:44<01:07, 49.0MB/s]
model-00001-of-00003.safetensors: 35%|█▍ | 1.73G/4.93G [00:44<01:11, 44.9MB/s] model-00001-of-00003.safetensors: 35%|█▍ | 1.74G/4.93G [00:44<01:08, 46.4MB/s]
model-00003-of-00003.safetensors: 76%|███ | 1.68G/2.21G [00:44<00:12, 41.2MB/s] model-00001-of-00003.safetensors: 36%|█▍ | 1.76G/4.93G [00:45<01:09, 45.8MB/s]
model-00003-of-00003.safetensors: 77%|███ | 1.70G/2.21G [00:45<00:11, 43.5MB/s] model-00002-of-00003.safetensors: 35%|█▍ | 1.73G/4.98G [00:45<01:13, 44.4MB/s]
model-00001-of-00003.safetensors: 36%|█▍ | 1.78G/4.93G [00:45<01:16, 41.2MB/s] model-00001-of-00003.safetensors: 36%|█▍ | 1.79G/4.93G [00:45<01:11, 44.1MB/s] model-00002-of-00003.safetensors: 35%|█▍ | 1.76G/4.98G [00:46<01:20, 40.0MB/s]
model-00001-of-00003.safetensors: 37%|█▍ | 1.81G/4.93G [00:46<01:10, 44.2MB/s]
model-00003-of-00003.safetensors: 79%|███▏| 1.74G/2.21G [00:46<00:12, 36.7MB/s] model-00001-of-00003.safetensors: 37%|█▍ | 1.82G/4.93G [00:46<01:24, 36.7MB/s] model-00002-of-00003.safetensors: 36%|█▍ | 1.79G/4.98G [00:46<01:13, 43.4MB/s] model-00001-of-00003.safetensors: 37%|█▍ | 1.84G/4.93G [00:47<01:21, 37.9MB/s]
model-00003-of-00003.safetensors: 80%|███▏| 1.76G/2.21G [00:47<00:14, 30.9MB/s]
model-00003-of-00003.safetensors: 80%|███▏| 1.78G/2.21G [00:47<00:12, 35.4MB/s] model-00001-of-00003.safetensors: 38%|█▌ | 1.86G/4.93G [00:47<01:25, 36.1MB/s]
model-00001-of-00003.safetensors: 38%|█▌ | 1.87G/4.93G [00:48<01:14, 41.1MB/s] model-00002-of-00003.safetensors: 37%|█▍ | 1.84G/4.98G [00:48<01:16, 40.9MB/s]
model-00003-of-00003.safetensors: 82%|███▎| 1.81G/2.21G [00:48<00:09, 40.9MB/s] model-00002-of-00003.safetensors: 37%|█▍ | 1.86G/4.98G [00:48<01:15, 41.2MB/s]
model-00003-of-00003.safetensors: 82%|███▎| 1.82G/2.21G [00:48<00:09, 40.1MB/s] model-00002-of-00003.safetensors: 38%|█▌ | 1.87G/4.98G [00:48<01:12, 42.6MB/s]
model-00001-of-00003.safetensors: 38%|█▌ | 1.89G/4.93G [00:49<01:52, 27.1MB/s] model-00002-of-00003.safetensors: 38%|█▌ | 1.89G/4.98G [00:49<01:09, 44.5MB/s]
model-00001-of-00003.safetensors: 39%|█▌ | 1.90G/4.93G [00:49<01:39, 30.4MB/s] model-00002-of-00003.safetensors: 38%|█▌ | 1.90G/4.98G [00:49<01:09, 44.2MB/s]
model-00001-of-00003.safetensors: 39%|█▌ | 1.92G/4.93G [00:49<01:29, 33.6MB/s]
model-00003-of-00003.safetensors: 85%|███▍| 1.89G/2.21G [00:49<00:06, 50.3MB/s] model-00001-of-00003.safetensors: 39%|█▌ | 1.94G/4.93G [00:50<01:21, 36.6MB/s] model-00002-of-00003.safetensors: 39%|█▌ | 1.94G/4.98G [00:50<01:04, 46.8MB/s]
model-00001-of-00003.safetensors: 40%|█▌ | 1.95G/4.93G [00:50<01:18, 38.2MB/s] model-00002-of-00003.safetensors: 39%|█▌ | 1.95G/4.98G [00:50<01:08, 43.9MB/s]
model-00003-of-00003.safetensors: 87%|███▍| 1.92G/2.21G [00:50<00:06, 47.6MB/s] model-00001-of-00003.safetensors: 40%|█▌ | 1.97G/4.93G [00:50<01:17, 38.3MB/s] model-00001-of-00003.safetensors: 40%|█▌ | 1.98G/4.93G [00:51<01:13, 40.0MB/s]
model-00003-of-00003.safetensors: 87%|███▍| 1.94G/2.21G [00:51<00:08, 33.1MB/s] model-00001-of-00003.safetensors: 41%|█▌ | 2.00G/4.93G [00:51<01:08, 43.1MB/s]
model-00003-of-00003.safetensors: 88%|███▌| 1.95G/2.21G [00:51<00:07, 36.7MB/s] model-00001-of-00003.safetensors: 41%|█▋ | 2.02G/4.93G [00:51<01:04, 45.1MB/s]
model-00001-of-00003.safetensors: 41%|█▋ | 2.03G/4.93G [00:52<01:00, 47.6MB/s] model-00002-of-00003.safetensors: 41%|█▋ | 2.03G/4.98G [00:52<01:06, 44.0MB/s]
model-00001-of-00003.safetensors: 42%|█▋ | 2.05G/4.93G [00:52<01:01, 46.7MB/s] model-00002-of-00003.safetensors: 41%|█▋ | 2.05G/4.98G [00:52<01:06, 44.2MB/s]
model-00001-of-00003.safetensors: 42%|█▋ | 2.06G/4.93G [00:52<01:00, 47.8MB/s] model-00002-of-00003.safetensors: 41%|█▋ | 2.06G/4.98G [00:52<01:03, 46.1MB/s]
model-00003-of-00003.safetensors: 91%|███▋| 2.02G/2.21G [00:53<00:04, 47.1MB/s] model-00001-of-00003.safetensors: 42%|█▋ | 2.08G/4.93G [00:53<01:11, 40.2MB/s] model-00001-of-00003.safetensors: 43%|█▋ | 2.11G/4.93G [00:54<01:03, 44.2MB/s] model-00002-of-00003.safetensors: 42%|█▋ | 2.11G/4.98G [00:54<01:06, 42.9MB/s]
model-00001-of-00003.safetensors: 43%|█▋ | 2.13G/4.93G [00:54<01:00, 46.5MB/s] model-00002-of-00003.safetensors: 43%|█▋ | 2.13G/4.98G [00:54<01:04, 44.4MB/s]
model-00001-of-00003.safetensors: 44%|█▊ | 2.18G/4.93G [00:55<00:51, 53.6MB/s]
model-00001-of-00003.safetensors: 44%|█▊ | 2.19G/4.93G [00:55<00:54, 49.9MB/s]
model-00001-of-00003.safetensors: 45%|█▊ | 2.21G/4.93G [00:55<00:53, 51.3MB/s] model-00002-of-00003.safetensors: 43%|█▋ | 2.14G/4.98G [00:55<02:02, 23.2MB/s]
model-00003-of-00003.safetensors: 95%|███▊| 2.10G/2.21G [00:56<00:03, 31.2MB/s] model-00001-of-00003.safetensors: 45%|█▊ | 2.22G/4.93G [00:56<01:02, 43.4MB/s] model-00002-of-00003.safetensors: 44%|█▋ | 2.18G/4.98G [00:56<01:29, 31.2MB/s]
model-00001-of-00003.safetensors: 45%|█▊ | 2.24G/4.93G [00:56<00:59, 45.6MB/s] model-00002-of-00003.safetensors: 44%|█▊ | 2.19G/4.98G [00:56<01:20, 34.4MB/s]
model-00001-of-00003.safetensors: 46%|█▊ | 2.26G/4.93G [00:57<00:56, 47.5MB/s]
model-00003-of-00003.safetensors: 97%|███▉| 2.14G/2.21G [00:57<00:01, 39.7MB/s] model-00001-of-00003.safetensors: 46%|█▊ | 2.27G/4.93G [00:57<00:57, 45.9MB/s]
model-00001-of-00003.safetensors: 46%|█▊ | 2.29G/4.93G [00:57<00:53, 49.0MB/s] model-00002-of-00003.safetensors: 45%|█▊ | 2.22G/4.98G [00:57<01:11, 38.7MB/s]
model-00001-of-00003.safetensors: 47%|█▊ | 2.30G/4.93G [00:58<00:54, 48.3MB/s] model-00002-of-00003.safetensors: 45%|█▊ | 2.24G/4.98G [00:58<01:05, 41.6MB/s]
model-00003-of-00003.safetensors: 99%|███▉| 2.19G/2.21G [00:58<00:00, 45.7MB/s] model-00001-of-00003.safetensors: 47%|█▉ | 2.32G/4.93G [00:58<01:02, 41.9MB/s]
model-00003-of-00003.safetensors: 100%|███▉| 2.21G/2.21G [00:58<00:00, 47.3MB/s] model-00003-of-00003.safetensors: 100%|████| 2.21G/2.21G [00:58<00:00, 37.6MB/s]
model-00001-of-00003.safetensors: 48%|█▉ | 2.35G/4.93G [00:59<00:56, 45.7MB/s] model-00001-of-00003.safetensors: 48%|█▉ | 2.37G/4.93G [00:59<00:55, 46.3MB/s] model-00001-of-00003.safetensors: 49%|█▉ | 2.40G/4.93G [01:00<00:57, 43.8MB/s] model-00001-of-00003.safetensors: 49%|█▉ | 2.42G/4.93G [01:00<00:54, 46.0MB/s] model-00001-of-00003.safetensors: 49%|█▉ | 2.43G/4.93G [01:00<00:54, 46.2MB/s] model-00001-of-00003.safetensors: 50%|█▉ | 2.45G/4.93G [01:01<00:51, 47.9MB/s] model-00001-of-00003.safetensors: 50%|█▉ | 2.46G/4.93G [01:01<00:51, 47.6MB/s] model-00002-of-00003.safetensors: 48%|█▉ | 2.40G/4.98G [01:01<00:59, 43.4MB/s] model-00001-of-00003.safetensors: 50%|██ | 2.48G/4.93G [01:02<01:10, 34.9MB/s] model-00001-of-00003.safetensors: 51%|██ | 2.50G/4.93G [01:02<01:02, 39.1MB/s] model-00002-of-00003.safetensors: 49%|█▉ | 2.45G/4.98G [01:02<00:52, 48.4MB/s] model-00001-of-00003.safetensors: 51%|██ | 2.51G/4.93G [01:03<01:02, 38.9MB/s] model-00001-of-00003.safetensors: 51%|██ | 2.53G/4.93G [01:03<00:57, 41.5MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.54G/4.93G [01:03<00:59, 40.2MB/s] model-00002-of-00003.safetensors: 50%|██ | 2.51G/4.98G [01:04<00:53, 46.2MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.56G/4.93G [01:04<01:09, 33.8MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.57G/4.93G [01:05<01:32, 25.4MB/s] model-00001-of-00003.safetensors: 52%|██ | 2.58G/4.93G [01:05<02:19, 16.9MB/s] model-00001-of-00003.safetensors: 53%|██ | 2.59G/4.93G [01:06<01:32, 25.2MB/s] model-00001-of-00003.safetensors: 53%|██ | 2.61G/4.93G [01:06<01:09, 33.3MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.64G/4.93G [01:07<01:01, 37.4MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.66G/4.93G [01:07<00:55, 40.8MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.67G/4.93G [01:07<00:51, 43.8MB/s] model-00001-of-00003.safetensors: 54%|██▏ | 2.69G/4.93G [01:08<00:55, 40.2MB/s] model-00001-of-00003.safetensors: 55%|██▏ | 2.70G/4.93G [01:08<00:52, 42.8MB/s] model-00001-of-00003.safetensors: 55%|██▏ | 2.72G/4.93G [01:08<00:49, 45.1MB/s] model-00001-of-00003.safetensors: 55%|██▏ | 2.74G/4.93G [01:09<00:52, 41.9MB/s] model-00001-of-00003.safetensors: 56%|██▏ | 2.75G/4.93G [01:09<00:51, 42.5MB/s] model-00001-of-00003.safetensors: 56%|██▏ | 2.77G/4.93G [01:09<00:48, 44.7MB/s] model-00001-of-00003.safetensors: 56%|██▎ | 2.78G/4.93G [01:10<00:44, 48.0MB/s] model-00002-of-00003.safetensors: 56%|██▏ | 2.77G/4.98G [01:10<00:47, 46.8MB/s] model-00001-of-00003.safetensors: 57%|██▎ | 2.80G/4.93G [01:10<00:52, 40.3MB/s] model-00001-of-00003.safetensors: 57%|██▎ | 2.82G/4.93G [01:11<00:49, 42.8MB/s] model-00001-of-00003.safetensors: 57%|██▎ | 2.83G/4.93G [01:11<00:48, 43.0MB/s] model-00001-of-00003.safetensors: 58%|██▎ | 2.85G/4.93G [01:11<00:46, 44.4MB/s] model-00001-of-00003.safetensors: 58%|██▎ | 2.86G/4.93G [01:12<00:43, 47.1MB/s] model-00001-of-00003.safetensors: 58%|██▎ | 2.88G/4.93G [01:12<00:42, 48.3MB/s] model-00001-of-00003.safetensors: 59%|██▎ | 2.91G/4.93G [01:12<00:41, 48.9MB/s] model-00001-of-00003.safetensors: 59%|██▎ | 2.92G/4.93G [01:13<01:12, 27.9MB/s] model-00001-of-00003.safetensors: 59%|██▎ | 2.93G/4.93G [01:13<01:05, 30.5MB/s] model-00001-of-00003.safetensors: 60%|██▍ | 2.96G/4.93G [01:14<00:48, 40.7MB/s] model-00001-of-00003.safetensors: 60%|██▍ | 2.98G/4.93G [01:14<00:42, 45.9MB/s] model-00002-of-00003.safetensors: 59%|██▍ | 2.96G/4.98G [01:14<00:47, 42.0MB/s] model-00001-of-00003.safetensors: 61%|██▍ | 2.99G/4.93G [01:15<00:50, 38.3MB/s] model-00001-of-00003.safetensors: 61%|██▍ | 3.01G/4.93G [01:15<00:47, 40.9MB/s] model-00002-of-00003.safetensors: 60%|██▍ | 3.01G/4.98G [01:15<00:42, 46.1MB/s] model-00001-of-00003.safetensors: 61%|██▍ | 3.02G/4.93G [01:16<00:53, 35.7MB/s] model-00001-of-00003.safetensors: 62%|██▍ | 3.06G/4.93G [01:16<00:43, 43.3MB/s] model-00002-of-00003.safetensors: 61%|██▍ | 3.06G/4.98G [01:16<00:42, 45.0MB/s] model-00001-of-00003.safetensors: 62%|██▍ | 3.07G/4.93G [01:17<00:43, 42.6MB/s] model-00001-of-00003.safetensors: 63%|██▌ | 3.09G/4.93G [01:17<00:50, 36.2MB/s] model-00002-of-00003.safetensors: 62%|██▍ | 3.10G/4.98G [01:17<00:39, 47.4MB/s] model-00001-of-00003.safetensors: 63%|██▌ | 3.12G/4.93G [01:18<00:44, 40.4MB/s] model-00002-of-00003.safetensors: 63%|██▌ | 3.14G/4.98G [01:18<00:41, 44.6MB/s] model-00001-of-00003.safetensors: 64%|██▌ | 3.14G/4.93G [01:19<00:48, 36.8MB/s] model-00001-of-00003.safetensors: 64%|██▌ | 3.15G/4.93G [01:19<00:44, 40.2MB/s] model-00001-of-00003.safetensors: 64%|██▌ | 3.17G/4.93G [01:19<00:43, 40.6MB/s] model-00001-of-00003.safetensors: 65%|██▌ | 3.18G/4.93G [01:20<00:42, 41.5MB/s] model-00001-of-00003.safetensors: 65%|██▌ | 3.20G/4.93G [01:20<00:47, 36.2MB/s] model-00001-of-00003.safetensors: 65%|██▌ | 3.22G/4.93G [01:21<00:44, 38.8MB/s] model-00002-of-00003.safetensors: 65%|██▌ | 3.25G/4.98G [01:21<00:38, 45.1MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.24G/4.93G [01:21<00:45, 37.3MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.24G/4.93G [01:21<00:47, 35.5MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.25G/4.93G [01:22<01:08, 24.5MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.26G/4.93G [01:22<00:44, 37.4MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.27G/4.93G [01:22<00:54, 30.6MB/s] model-00001-of-00003.safetensors: 66%|██▋ | 3.28G/4.93G [01:23<00:54, 30.3MB/s] model-00002-of-00003.safetensors: 67%|██▋ | 3.33G/4.98G [01:23<01:00, 27.0MB/s] model-00001-of-00003.safetensors: 67%|██▋ | 3.30G/4.93G [01:23<01:05, 25.0MB/s] model-00002-of-00003.safetensors: 68%|██▋ | 3.36G/4.98G [01:23<00:46, 34.6MB/s] model-00002-of-00003.safetensors: 68%|██▋ | 3.38G/4.98G [01:24<00:39, 40.3MB/s] model-00001-of-00003.safetensors: 67%|██▋ | 3.31G/4.93G [01:24<01:12, 22.4MB/s] model-00001-of-00003.safetensors: 67%|██▋ | 3.33G/4.93G [01:25<00:58, 27.6MB/s] model-00002-of-00003.safetensors: 69%|██▊ | 3.42G/4.98G [01:25<00:34, 44.8MB/s] model-00001-of-00003.safetensors: 68%|██▋ | 3.34G/4.93G [01:25<00:47, 33.2MB/s] model-00001-of-00003.safetensors: 69%|██▊ | 3.39G/4.93G [01:26<00:37, 41.4MB/s] model-00001-of-00003.safetensors: 69%|██▊ | 3.41G/4.93G [01:26<00:37, 40.6MB/s] model-00001-of-00003.safetensors: 69%|██▊ | 3.42G/4.93G [01:27<00:36, 41.5MB/s] model-00001-of-00003.safetensors: 70%|██▊ | 3.44G/4.93G [01:27<00:35, 42.3MB/s] model-00001-of-00003.safetensors: 70%|██▊ | 3.47G/4.93G [01:28<00:31, 46.2MB/s] model-00001-of-00003.safetensors: 71%|██▊ | 3.49G/4.93G [01:28<00:32, 44.9MB/s] model-00001-of-00003.safetensors: 71%|██▊ | 3.50G/4.93G [01:28<00:29, 47.8MB/s] model-00002-of-00003.safetensors: 71%|██▊ | 3.54G/4.98G [01:28<00:38, 37.4MB/s] model-00001-of-00003.safetensors: 72%|██▉ | 3.55G/4.93G [01:29<00:29, 46.5MB/s] model-00001-of-00003.safetensors: 72%|██▉ | 3.57G/4.93G [01:30<00:28, 47.1MB/s] model-00001-of-00003.safetensors: 73%|██▉ | 3.58G/4.93G [01:30<00:30, 43.7MB/s] model-00001-of-00003.safetensors: 73%|██▉ | 3.60G/4.93G [01:31<00:30, 43.5MB/s] model-00001-of-00003.safetensors: 74%|██▉ | 3.63G/4.93G [01:31<00:26, 48.3MB/s] model-00001-of-00003.safetensors: 74%|██▉ | 3.65G/4.93G [01:31<00:25, 49.7MB/s] model-00001-of-00003.safetensors: 74%|██▉ | 3.66G/4.93G [01:32<00:28, 44.5MB/s] model-00001-of-00003.safetensors: 75%|██▉ | 3.68G/4.93G [01:32<00:28, 43.9MB/s] model-00001-of-00003.safetensors: 75%|██▉ | 3.70G/4.93G [01:33<00:27, 44.5MB/s] model-00001-of-00003.safetensors: 75%|███ | 3.71G/4.93G [01:33<00:26, 45.8MB/s] model-00001-of-00003.safetensors: 76%|███ | 3.73G/4.93G [01:33<00:25, 47.0MB/s] model-00002-of-00003.safetensors: 75%|██▉ | 3.73G/4.98G [01:33<00:27, 45.0MB/s] model-00001-of-00003.safetensors: 76%|███ | 3.76G/4.93G [01:34<00:27, 43.2MB/s] model-00002-of-00003.safetensors: 76%|███ | 3.76G/4.98G [01:34<00:28, 43.2MB/s] model-00001-of-00003.safetensors: 77%|███ | 3.78G/4.93G [01:34<00:26, 43.9MB/s] model-00001-of-00003.safetensors: 77%|███ | 3.79G/4.93G [01:35<00:27, 41.3MB/s] model-00001-of-00003.safetensors: 77%|███ | 3.81G/4.93G [01:36<00:33, 34.1MB/s] model-00001-of-00003.safetensors: 78%|███ | 3.84G/4.93G [01:36<00:27, 40.0MB/s] model-00001-of-00003.safetensors: 78%|███ | 3.85G/4.93G [01:37<00:29, 36.5MB/s] model-00001-of-00003.safetensors: 78%|███▏| 3.85G/4.93G [01:37<00:28, 37.8MB/s] model-00002-of-00003.safetensors: 78%|███ | 3.87G/4.98G [01:37<00:27, 40.9MB/s] model-00002-of-00003.safetensors: 78%|███ | 3.89G/4.98G [01:37<00:26, 41.2MB/s] model-00001-of-00003.safetensors: 78%|███▏| 3.86G/4.93G [01:38<01:08, 15.8MB/s] model-00001-of-00003.safetensors: 78%|███▏| 3.87G/4.93G [01:38<00:49, 21.6MB/s] model-00001-of-00003.safetensors: 79%|███▏| 3.89G/4.93G [01:38<00:38, 27.2MB/s] model-00002-of-00003.safetensors: 79%|███▏| 3.95G/4.98G [01:39<00:21, 47.6MB/s] model-00001-of-00003.safetensors: 79%|███▏| 3.90G/4.93G [01:39<00:42, 24.1MB/s] model-00001-of-00003.safetensors: 79%|███▏| 3.92G/4.93G [01:39<00:33, 29.9MB/s] model-00002-of-00003.safetensors: 80%|███▏| 4.00G/4.98G [01:40<00:20, 47.2MB/s] model-00001-of-00003.safetensors: 80%|███▏| 3.94G/4.93G [01:40<00:32, 31.1MB/s] model-00002-of-00003.safetensors: 81%|███▏| 4.03G/4.98G [01:40<00:18, 52.4MB/s] model-00001-of-00003.safetensors: 80%|███▏| 3.95G/4.93G [01:40<00:32, 30.2MB/s] model-00001-of-00003.safetensors: 80%|███▏| 3.97G/4.93G [01:41<00:29, 32.6MB/s] model-00001-of-00003.safetensors: 81%|███▏| 3.98G/4.93G [01:41<00:26, 36.3MB/s] model-00001-of-00003.safetensors: 81%|███▏| 4.00G/4.93G [01:42<00:23, 40.2MB/s] model-00001-of-00003.safetensors: 81%|███▎| 4.02G/4.93G [01:42<00:23, 39.6MB/s] model-00001-of-00003.safetensors: 82%|███▎| 4.03G/4.93G [01:42<00:20, 42.9MB/s] model-00001-of-00003.safetensors: 82%|███▎| 4.06G/4.93G [01:43<00:18, 48.2MB/s] model-00001-of-00003.safetensors: 83%|███▎| 4.08G/4.93G [01:43<00:17, 49.5MB/s] model-00001-of-00003.safetensors: 83%|███▎| 4.10G/4.93G [01:43<00:16, 51.6MB/s] model-00001-of-00003.safetensors: 83%|███▎| 4.11G/4.93G [01:44<00:16, 50.8MB/s] model-00001-of-00003.safetensors: 84%|███▎| 4.13G/4.93G [01:44<00:15, 51.7MB/s] model-00001-of-00003.safetensors: 84%|███▎| 4.14G/4.93G [01:44<00:15, 50.3MB/s] model-00001-of-00003.safetensors: 84%|███▎| 4.16G/4.93G [01:45<00:16, 45.9MB/s] model-00001-of-00003.safetensors: 85%|███▍| 4.19G/4.93G [01:45<00:15, 47.5MB/s] model-00001-of-00003.safetensors: 85%|███▍| 4.21G/4.93G [01:46<00:14, 49.7MB/s] model-00001-of-00003.safetensors: 86%|███▍| 4.22G/4.93G [01:46<00:17, 40.1MB/s] model-00001-of-00003.safetensors: 86%|███▍| 4.24G/4.93G [01:47<00:16, 42.4MB/s] model-00002-of-00003.safetensors: 87%|███▍| 4.32G/4.98G [01:47<00:15, 41.9MB/s] model-00001-of-00003.safetensors: 86%|███▍| 4.26G/4.93G [01:47<00:17, 39.1MB/s] model-00001-of-00003.safetensors: 87%|███▍| 4.27G/4.93G [01:48<00:16, 40.7MB/s] model-00001-of-00003.safetensors: 87%|███▍| 4.29G/4.93G [01:48<00:15, 42.4MB/s] model-00001-of-00003.safetensors: 87%|███▍| 4.30G/4.93G [01:48<00:14, 44.5MB/s] model-00001-of-00003.safetensors: 88%|███▌| 4.32G/4.93G [01:49<00:13, 45.2MB/s] model-00002-of-00003.safetensors: 89%|███▌| 4.42G/4.98G [01:49<00:11, 47.4MB/s] model-00001-of-00003.safetensors: 88%|███▌| 4.34G/4.93G [01:49<00:19, 31.1MB/s] model-00002-of-00003.safetensors: 89%|███▌| 4.45G/4.98G [01:49<00:11, 46.4MB/s] model-00001-of-00003.safetensors: 88%|███▌| 4.35G/4.93G [01:50<00:17, 33.3MB/s] model-00001-of-00003.safetensors: 89%|███▌| 4.37G/4.93G [01:50<00:15, 35.6MB/s] model-00001-of-00003.safetensors: 89%|███▌| 4.40G/4.93G [01:51<00:12, 43.2MB/s] model-00002-of-00003.safetensors: 91%|███▋| 4.51G/4.98G [01:51<00:10, 46.0MB/s] model-00001-of-00003.safetensors: 90%|███▌| 4.42G/4.93G [01:51<00:11, 43.8MB/s] model-00001-of-00003.safetensors: 90%|███▌| 4.45G/4.93G [01:52<00:10, 44.4MB/s] model-00002-of-00003.safetensors: 92%|███▋| 4.56G/4.98G [01:52<00:09, 44.0MB/s] model-00001-of-00003.safetensors: 91%|███▋| 4.48G/4.93G [01:52<00:09, 46.9MB/s] model-00001-of-00003.safetensors: 91%|███▋| 4.50G/4.93G [01:53<00:09, 44.8MB/s] model-00001-of-00003.safetensors: 91%|███▋| 4.51G/4.93G [01:53<00:09, 42.9MB/s] model-00001-of-00003.safetensors: 92%|███▋| 4.53G/4.93G [01:54<00:09, 41.4MB/s] model-00002-of-00003.safetensors: 93%|███▋| 4.64G/4.98G [01:54<00:08, 38.3MB/s] model-00002-of-00003.safetensors: 94%|███▋| 4.66G/4.98G [01:54<00:07, 42.3MB/s] model-00001-of-00003.safetensors: 93%|███▋| 4.58G/4.93G [01:55<00:08, 42.4MB/s] model-00001-of-00003.safetensors: 93%|███▋| 4.59G/4.93G [01:55<00:07, 45.2MB/s] model-00001-of-00003.safetensors: 93%|███▋| 4.61G/4.93G [01:56<00:06, 46.8MB/s] model-00001-of-00003.safetensors: 94%|███▊| 4.64G/4.93G [01:56<00:05, 51.6MB/s] model-00002-of-00003.safetensors: 95%|███▊| 4.74G/4.98G [01:56<00:05, 42.0MB/s] model-00001-of-00003.safetensors: 95%|███▊| 4.67G/4.93G [01:57<00:04, 52.7MB/s] model-00002-of-00003.safetensors: 96%|███▊| 4.77G/4.98G [01:57<00:04, 46.4MB/s] model-00001-of-00003.safetensors: 95%|███▊| 4.69G/4.93G [01:57<00:04, 52.0MB/s] model-00001-of-00003.safetensors: 95%|███▊| 4.70G/4.93G [01:58<00:05, 43.0MB/s] model-00001-of-00003.safetensors: 96%|███▊| 4.74G/4.93G [01:58<00:04, 47.6MB/s] model-00001-of-00003.safetensors: 96%|███▊| 4.75G/4.93G [01:59<00:03, 48.0MB/s] model-00001-of-00003.safetensors: 97%|███▊| 4.77G/4.93G [01:59<00:03, 48.3MB/s] model-00001-of-00003.safetensors: 97%|███▉| 4.78G/4.93G [01:59<00:03, 46.4MB/s] model-00002-of-00003.safetensors: 97%|███▉| 4.85G/4.98G [01:59<00:03, 40.0MB/s] model-00001-of-00003.safetensors: 97%|███▉| 4.80G/4.93G [02:00<00:02, 45.9MB/s] model-00002-of-00003.safetensors: 98%|███▉| 4.88G/4.98G [02:00<00:02, 44.1MB/s] model-00001-of-00003.safetensors: 98%|███▉| 4.82G/4.93G [02:00<00:03, 34.5MB/s] model-00002-of-00003.safetensors: 99%|███▉| 4.91G/4.98G [02:01<00:01, 47.9MB/s] model-00001-of-00003.safetensors: 98%|███▉| 4.85G/4.93G [02:01<00:02, 38.3MB/s] model-00001-of-00003.safetensors: 99%|███▉| 4.86G/4.93G [02:01<00:01, 41.2MB/s] model-00001-of-00003.safetensors: 99%|███▉| 4.88G/4.93G [02:02<00:01, 43.4MB/s] model-00002-of-00003.safetensors: 100%|████| 4.98G/4.98G [02:02<00:00, 40.6MB/s] model-00001-of-00003.safetensors: 100%|████| 4.93G/4.93G [02:03<00:00, 39.9MB/s]
Upload 4 LFS files: 100%|█████████████████████████| 4/4 [02:03<00:00, 31.00s/it] 2023-11-14 05:55:20 - INFO - main - Model saved to data/apt-chat-yi-6B-sft-full [INFO|modelcard.py:452] 2023-11-14 05:55:21,054 >> Dropping the following result as it does not have all the necessary fields: {'dataset': {'name': 'communityai/apt-chat-micro-dataset-llm-v2-714k', 'type': 'communityai/apt-chat-micro-dataset-llm-v2-714k'}} [INFO|configuration_utils.py:461] 2023-11-14 05:55:21,057 >> Configuration saved in data/apt-chat-yi-6B-sft-full/config.json 2023-11-14 05:55:21 - INFO - main - Pushing to hub...
This is probably related to flash attn being disabled and the large prompt limit of 4096. Are you using deepspeed? Do the yi models not support flash-attn?
This is probably related to flash attn being disabled and the large prompt limit of 4096. Are you using deepspeed? Do the yi models not support flash-attn?
Hi @edbeeching, just curious if you can run full SFT on 7b model without deepspeed? I have tried and never been able to run with multi-gpu.yaml. Only deepspeed works with stage2 or stage 3.
Hello everyone, I'm encountering a memory issue while fine-tuning a 7b model (such as Mistral) using a repository I found. Despite having 6 H100 GPUs at my disposal, I run into out-of-memory errors when using a batch size of 4. Interestingly, when I use libraries like Axolotl for similar tasks, I don't face this problem. Could anyone provide insights or suggestions on how to resolve these memory issues with the specific repository I'm using for fine-tuning? Any help would be greatly appreciated!