huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.73k stars 413 forks source link

Running on single GPU(16GB) #55

Open patchie opened 1 year ago

patchie commented 1 year ago

Hi,

What is the best way to run this on my high performance laptop? Should this somehow work? Can i calculate how many days/weeks it will run?

Thanks in advance

Specs:

OS: Win 11 (WSL2) CPU: Intel Core i7 12850HX Make: Lenovo Thinkpad P16 gen 1 Memory: 128GB DDR5-4800 (2400MHz) GPU: Nvidia RTX A5500 16GB

I found that this command would work on my laptop it seems: ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true --gradient_accumulation_steps=1024 --per_device_eval_batch_size=1 --per_device_train_batch_size=1

how now run it for 1-2 hours ish:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_lora.yaml --load_in_4bit=true --gradient_accumulation_steps=1024 --per_device_eval_batch_size=1 --per_device_train_batch_size=1 INFO:root:Using nproc_per_node=1. 2023-11-27 15:41:33.914308: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2023-11-27 15:41:33.941565: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-27 15:41:34.582753: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [2023-11-27 15:41:35,164] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) /usr/local/lib/python3.11/dist-packages/trl/trainer/ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead. warnings.warn( 2023-11-27 15:41:35 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: False 2023-11-27 15:41:35 - INFO - main - Model parameters ModelArguments(base_model_revision=None, model_name_or_path='mistralai/Mistral-7B-v0.1', model_revision='main', model_code_revision=None, torch_dtype='auto', trust_remote_code=False, use_flash_attention_2=True, use_peft=True, lora_r=64, lora_alpha=16, lora_dropout=0.1, lora_target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'], lora_modules_to_save=None, load_in_8bit=False, load_in_4bit=True, bnb_4bit_quant_type='nf4', use_bnb_nested_quant=False) 2023-11-27 15:41:35 - INFO - main - Data parameters DataArguments(chat_template=None, dataset_mixer={'HuggingFaceH4/ultrachat_200k': 1.0}, dataset_splits=['train_sft', 'test_sft'], max_train_samples=None, max_eval_samples=None, preprocessing_num_workers=12, truncation_side=None) 2023-11-27 15:41:35 - INFO - main - Training/evaluation parameters SFTConfig( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=True, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.EPOCH, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1024, gradient_checkpointing=True, gradient_checkpointing_kwargs={'use_reentrant': False}, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zephyr-7b-sft-lora, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=2e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=info, log_level_replica=warning, log_on_each_node=True, logging_dir=data/zephyr-7b-sft-lora/runs/Nov27_15-41-35, logging_first_step=True, logging_nan_inf_filter=True, logging_steps=5, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_seq_length=2048, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=data/zephyr-7b-sft-lora, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=data/zephyr-7b-sft-lora, save_on_each_node=False, save_safetensors=True, save_steps=500, save_strategy=IntervalStrategy.NO, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) Overwrite dataset info from restored data version if exists. 2023-11-27 15:41:38 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat_200k/default/0.0.0/e9d36c4d9da46458 2023-11-27 15:41:38 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458 Found cached dataset ultrachat200k (/root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458) 2023-11-27 15:41:38 - INFO - datasets.builder - Found cached dataset ultrachat_200k (/root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat_200k/default/0.0.0/e9d36c4d9da46458) Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat200k/default/0.0.0/e9d36c4d9da46458 2023-11-27 15:41:38 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458 Overwrite dataset info from restored data version if exists. 2023-11-27 15:41:40 - INFO - datasets.builder - Overwrite dataset info from restored data version if exists. Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat200k/default/0.0.0/e9d36c4d9da46458 2023-11-27 15:41:40 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458 Found cached dataset ultrachat200k (/root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458) 2023-11-27 15:41:40 - INFO - datasets.builder - Found cached dataset ultrachat_200k (/root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat_200k/default/0.0.0/e9d36c4d9da46458) Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat200k/default/0.0.0/e9d36c4d9da46458 2023-11-27 15:41:40 - INFO - datasets.info - Loading Dataset info from /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458 Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4___ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-91f7f728fecb2505.arrow 2023-11-27 15:41:40 - INFO - datasets.arrowdataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-91f7f728fecb2505.arrow Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-83009ff6f17d65d0.arrow 2023-11-27 15:41:40 - INFO - datasets.arrowdataset - Loading cached shuffled indices for dataset at /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-83009ff6f17d65d0.arrow 2023-11-27 15:41:40 - INFO - main - Training on the following datasets and their proportions: ['train : 207865', 'test : 23110'] [INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/tokenizer.model [INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/tokenizer.json [INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/special_tokens_map.json [INFO|tokenization_utils_base.py:2022] 2023-11-27 15:41:40,744 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/tokenizer_config.json Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-3e95fae9b410a2c7.arrow 2023-11-27 15:41:40 - INFO - datasets.arrowdataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-3e95fae9b410a2c7.arrow Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4_ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-84dc14e69dab5370.arrow 2023-11-27 15:41:40 - INFO - datasets.arrowdataset - Loading cached processed dataset at /root/.cache/huggingface/datasets/HuggingFaceH4ultrachat_200k/default/0.0.0/e9d36c4d9da46458/cache-84dc14e69dab5370.arrow 2023-11-27 15:41:40 - INFO - main - Sample 167621 of the processed training set: ........ 2023-11-27 15:41:40 - INFO - main - Load pretrained model 2023-11-27 15:41:40 - INFO - main - Model loaded! /usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py:145: UserWarning: You passed a model_id to the SFTTrainer. This will automatically create an AutoModelForCausalLM or a PeftModel (if you passed a peft_config) for you. warnings.warn( [INFO|configuration_utils.py:717] 2023-11-27 15:41:40,964 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/config.json [INFO|configuration_utils.py:777] 2023-11-27 15:41:40,964 >> Model config MistralConfig { "_name_or_path": "mistralai/Mistral-7B-v0.1", "architectures": [ "MistralForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 10000.0, "sliding_window": 4096, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.35.0", "use_cache": false, "vocab_size": 32000 }

[INFO|modeling_utils.py:3121] 2023-11-27 15:41:40,972 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/pytorch_model.bin.index.json [INFO|modeling_utils.py:3184] 2023-11-27 15:41:40,974 >> Will use torch_dtype=torch.bfloat16 as defined in model's config object [INFO|modeling_utils.py:1222] 2023-11-27 15:41:40,974 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:791] 2023-11-27 15:41:40,976 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "use_cache": false }

[INFO|modeling_utils.py:3257] 2023-11-27 15:41:41,631 >> Detected 4-bit loading: activating 4-bit loading for this model Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.75s/it][INFO|modeling_utils.py:3950] 2023-11-27 15:41:51,332 >> All model checkpoint weights were used when initializing MistralForCausalLM.

[INFO|modeling_utils.py:3958] 2023-11-27 15:41:51,332 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at mistralai/Mistral-7B-v0.1. If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training. [INFO|configuration_utils.py:751] 2023-11-27 15:41:51,488 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--mistralai--Mistral-7B-v0.1/snapshots/5e9c98b96d071dce59368012254c55b0ec6f8658/generation_config.json [INFO|configuration_utils.py:791] 2023-11-27 15:41:51,488 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 }

[INFO|training_args.py:1784] 2023-11-27 15:41:51,646 >> PyTorch: setting up devices /usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py:247: UserWarning: You passed a tokenizer with padding_side not equal to right to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding tokenizer.padding_side = 'right' to your code. warnings.warn( [INFO|trainer.py:593] 2023-11-27 15:41:52,619 >> Using auto half precision backend 2023-11-27 15:41:52 - INFO - main - Train [INFO|trainer.py:1723] 2023-11-27 15:41:53,614 >> Running training [INFO|trainer.py:1724] 2023-11-27 15:41:53,614 >> Num examples = 207,865 [INFO|trainer.py:1725] 2023-11-27 15:41:53,614 >> Num Epochs = 1 [INFO|trainer.py:1726] 2023-11-27 15:41:53,614 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1729] 2023-11-27 15:41:53,614 >> Total train batch size (w. parallel, distributed & accumulation) = 1,024 [INFO|trainer.py:1730] 2023-11-27 15:41:53,614 >> Gradient Accumulation steps = 1024 [INFO|trainer.py:1731] 2023-11-27 15:41:53,614 >> Total optimization steps = 202 [INFO|trainer.py:1732] 2023-11-27 15:41:53,616 >> Number of trainable parameters = 54,525,952 0%| | 0/202 [00:00<?, ?it/s][WARNING|tokenization_utils_base.py:3831] 2023-11-27 15:41:54,956 >> Token indices sequence length is longer than the specified maximum sequence length for this model (2377 > 2048). Running this sequence through the model will result in indexing errors [WARNING|logging.py:314] 2023-11-27 15:41:55,018 >> You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding. [WARNING|logging.py:329] 2023-11-27 15:41:55,763 >> The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.bfloat16. [W reducer.cpp:1346] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 0%|▌ | 1/202 [4:36:47<927:14:16, 16607.25s/it]{'loss': 1.1453, 'learning_rate': 1.9998790632601496e-05, 'epoch': 0.0} 0%|▌ | 1/202 [4:36:47<927:14:16, 16607.25s/it]

patchie commented 11 months ago

just wanted to update with the result of the training, if it will help anyone:

{'loss': 1.0238, 'learning_rate': 1.2462789068320016e-05, 'epoch': 0.42} {'loss': 1.013, 'learning_rate': 1.1702435557223988e-05, 'epoch': 0.44} {'loss': 1.022, 'learning_rate': 1.0931792674840718e-05, 'epoch': 0.47} {'loss': 1.0153, 'learning_rate': 1.0155518119203511e-05, 'epoch': 0.49} {'loss': 1.0143, 'learning_rate': 9.378303625685196e-06, 'epoch': 0.52} {'loss': 1.0191, 'learning_rate': 8.604846610560771e-06, 'epoch': 0.54} {'loss': 1.0176, 'learning_rate': 7.839821780235168e-06, 'epoch': 0.57} {'loss': 1.0169, 'learning_rate': 7.0878528777274814e-06, 'epoch': 0.59} {'loss': 1.0168, 'learning_rate': 6.35348473717345e-06, 'epoch': 0.62} {'loss': 1.0117, 'learning_rate': 5.64115581524629e-06, 'epoch': 0.64} {'loss': 1.0106, 'learning_rate': 4.955171365513603e-06, 'epoch': 0.67} 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 136/202 [150:33:44<70:43:02, 3857.31s/it][INFO|trainer.py:3158] 2023-12-10 17:22:46,485 >> Running Evaluation [INFO|trainer.py:3160] 2023-12-10 17:22:46,486 >> Num examples = 23110 [INFO|trainer.py:3163] 2023-12-10 17:22:46,486 >> Batch size = 1 {'eval_loss': 1.0159717798233032, 'eval_runtime': 19243.2251, 'eval_samples_per_second': 1.201, 'eval_steps_per_second': 1.201, 'epoch': 0.67} 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 136/202 [156:04:41<70:43:02, 3857.31s/it[INFO|trainer.py:1955] 2023-12-10 22:43:29,715 >>

Training completed. Do not forget to share your model on huggingface.co/models =)

{'train_runtime': 561882.257, 'train_samples_per_second': 0.37, 'train_steps_per_second': 0.0, 'train_loss': 1.0438810963841045, 'epoch': 0.67} 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 136/202 [156:04:41<75:44:37, 4131.48s/it] train metrics epoch = 0.67 train_loss = 1.0439 train_runtime = 6 days, 12:04:42.25 train_samples = 207865 train_samples_per_second = 0.37 train_steps_per_second = 0.0 2023-12-10 22:43:29 - INFO - main - Evaluate [INFO|trainer.py:3158] 2023-12-10 22:43:29,739 >> Running Evaluation [INFO|trainer.py:3160] 2023-12-10 22:43:29,739 >> Num examples = 23110 [INFO|trainer.py:3163] 2023-12-10 22:43:29,739 >> Batch size = 1 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 15431/23110 [5:22:04<2:40:16, 1.25s/it] eval metrics epoch = 0.67 eval_loss = 1.016 eval_runtime = 5:22:05.99 eval_samples = 23110 eval_samples_per_second = 1.196 eval_steps_per_second = 1.196 2023-12-11 04:05:35 - INFO - main - Save model [INFO|trainer.py:2881] 2023-12-11 04:05:35,784 >> Saving model checkpoint to data/zephyr-7b-sft-lora [INFO|tokenization_utils_base.py:2428] 2023-12-11 04:05:39,111 >> tokenizer config file saved in data/zephyr-7b-sft-lora/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-11 04:05:39,115 >> Special tokens file saved in data/zephyr-7b-sft-lora/special_tokens_map.json [INFO|trainer.py:2881] 2023-12-11 04:05:39,299 >> Saving model checkpoint to data/zephyr-7b-sft-lora [INFO|tokenization_utils_base.py:2428] 2023-12-11 04:05:41,961 >> tokenizer config file saved in data/zephyr-7b-sft-lora/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-11 04:05:41,966 >> Special tokens file saved in data/zephyr-7b-sft-lora/special_tokens_map.json events.out.tfevents.1702263935.17694.1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 359/359 [00:01<00:00, 189B/s]events.out.tfevents.1701096113.9499.0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8.50k/8.50k [00:01<00:00, 4.45kB/s]events.out.tfevents.1701681021.4007.0: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.65k/4.65k [00:01<00:00, 2.40kB/s]events.out.tfevents.1701682727.17694.0: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.59k/9.59k [00:01<00:00, 4.94kB/s]training_args.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.66k/4.66k [00:00<00:00, 27.4kB/s]tokenizer.model: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 692kB/s]adapter_model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 218M/218M [00:19<00:00, 11.2MB/s]Upload 7 LFS files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:20<00:00, 2.87s/it]2023-12-11 04:06:06 - INFO - main - Model saved to data/zephyr-7b-sft-lora████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 217M/218M [00:18<00:00, 10.9MB/s][INFO|modelcard.py:452] 2023-12-11 04:06:06,770 >> Dropping the following result as it does not have all the necessary fields: {'dataset': {'name': 'HuggingFaceH4/ultrachat_200k', 'type': 'HuggingFaceH4/ultrachat_200k'}} [INFO|configuration_utils.py:461] 2023-12-11 04:06:06,779 >> Configuration saved in data/zephyr-7b-sft-lora/config.json 2023-12-11 04:06:06 - INFO - main - Pushing to hub... [INFO|trainer.py:2881] 2023-12-11 04:06:06,779 >> Saving model checkpoint to data/zephyr-7b-sft-lora [INFO|tokenization_utils_base.py:2428] 2023-12-11 04:06:09,653 >> tokenizer config file saved in data/zephyr-7b-sft-lora/tokenizer_config.json [INFO|tokenization_utils_base.py:2437] 2023-12-11 04:06:09,659 >> Special tokens file saved in data/zephyr-7b-sft-lora/special_tokens_map.json