Please provide a clear and concise description of what the question is.
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=False,
ddp_timeout=30000,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=50,
evaluation_strategy=steps,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
fsdp=[],
fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False},
fsdp_min_num_params=0,
fsdp_transformer_layer_cls_to_wrap=None,
full_determinism=False,
generation_config=None,
generation_max_length=None,
generation_num_beams=None,
gradient_accumulation_steps=1,
gradient_checkpointing=True,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_always_push=False,
hub_model_id=None,
hub_private_repo=False,
hub_strategy=every_save,
hub_token=,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=e:/Liuc/llama-main/models_hf/7B-Chat-Finetuned\runs\Nov01_10-38-31_PUERSAI-PC,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=e:/Liuc/llama-main/models_hf/7B-Chat-Finetuned,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=e:/Liuc/llama-main/models_hf/7B-Chat-Finetuned,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=3,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.05,
warmup_steps=0,
weight_decay=0.05,
)
2023-11-01 10:38:31.355 | INFO | main:main:863 - Script args: ScriptArguments(use_peft=True, target_modules='all', lora_rank=8, lora_dropout=0.05, lora_alpha=16.0, modules_to_save=None, peft_path=None, qlora=False, model_max_length=512)
2023-11-01 10:38:31.355 | INFO | main:main:864 - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: True
2023-11-01 10:38:31.450 | INFO | main:main:892 - Add pad token:
2023-11-01 10:38:31.451 | DEBUG | main:main:894 - Tokenizer: LlamaTokenizer(name_or_path='e:/Liuc/llama-main/models_hf/7B-Chat', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
2023-11-01 10:38:31.452 | INFO | main:main:924 - train files: ['./data/finetune/train\train_data_LLAMA_instruction_alpaca_noinstruct.jsonl']
2023-11-01 10:38:31.453 | INFO | main:main:929 - eval files: ['./data/finetune/test\test_data_LLAMA_instruction_alpaca_noinstruct.jsonl']
2023-11-01 10:38:32.741 | INFO | main:main:950 - Raw datasets: DatasetDict({
train: Dataset({
features: ['conversations'],
num_rows: 202532
num_rows: 50633
})
})
2023-11-01 10:38:32.742 | DEBUG | main:main:1038 - Example train_dataset[0]: {'conversations': [{'from': 'human', 'value': "As a proficient assistant, ensure to give accurate
and comprehensive responses to the specified questions or tasks.Which university received a NIH grant for 'Micro Coherence Imaging Technology for Assessing Obstructive Lung Disease In Vivo' in 2016?"}, {'from': 'gpt', 'value': "Johns Hopkins University received a 2016 NIH grant for 'Micro Coherence Imaging Technology for Assessing Obstructive Lung Disease In Vivo'."}]} and comprehensive
Running tokenizer on train dataset (num_proc=4): 0%| | 0/202532 [00:00<}, {'from': 'gpt'?, ? examples/s]binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll ?, ? examples/s]b
binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
Running tokenizer on train dataset (num_proc=4): 8%|████████ | 17000/202532 [00:08<01:03, 294Running tokenizer on train dataset (num_proc=4): 10%|█████████▉ | 21000/202532 [00:09<00:55, 3274.62 examples/s]Running tokenizer on train dataset (num_proc=4): 12%|███████████▊ | 25000/202532 [00:10<00:50, 353Running tokenizer on train dataset (num_proc=4): 14%|█████████████▋ | 29000/202532 [00:11<00:46, 374Running tokenizer on train dataset (num_proc=4): 16%|███████████████▏ | 32000/202532 [00:11<00:35, 483Running tokenizer on train dataset (num_proc=4): 17%|████████████████ | 34000/202532 [00:12<00:42, 400Running tokenizer on train dataset (num_proc=4): 18%|█████████████████ | 36000/202532 [00:12<00:34, 479Running tokenizer on train dataset (num_proc=4): 19%|██████████████████ | 38000/202532 [00:13<00:42, 388Running tokenizer on train dataset (num_proc=4): 20%|██████████████████▉ | 40000/202532 [00:13<00:34, 476Running tokenizer on train dataset (num_proc=4): 21%|███████████████████▉ | 42000/202532 [00:14<00:41, 383Running tokenizer on train dataset (num_proc=4): 22%|████████████████████▊ | 44000/202532 [00:14<00:33, 478Running tokenizer on train dataset (num_proc=4): 22%|█████████████████████▎ | 45000/202532 [00:15<00:48, 327Running tokenizer on train dataset (num_proc=4): 24%|██████████████████████▊ | 48000/202532 [00:15<00:31, 494Running tokenizer on train dataset (num_proc=4): 25%|███████████████████████▋ | 50000/202532 [00:16<00:39, 383Running tokenizer on train dataset (num_proc=4): 26%|████████████████████████▋ | 52000/202532 [00:16<00:31, 475Running tokenizer on train dataset (num_proc=4): 26%|█████████████████████████ | 53000/202532 [00:16<00:45, 328Running tokenizer on train dataset (num_proc=4): 28%|██████████████████████████▌ | 56000/202532 [00:17<00:30, 487Running tokenizer on train dataset (num_proc=4): 28%|███████████████████████████ | 57000/202532 [00:17<00:42, 341Running tokenizer on train dataset (num_proc=4): 30%|████████████████████████████▍ | 60000/202532 [00:18<00:29, 489Running tokenizer on train dataset (num_proc=4): 30%|████████████████████████████▉ | 61000/202532 [00:18<00:41, 340Running tokenizer on train dataset (num_proc=4): 32%|██████████████████████████████▎ | 64000/202532 [00:19<00:27, 500Running tokenizer on train dataset (num_proc=4): 32%|██████████████████████████████▊ | 65000/202532 [00:19<00:40, 343Running tokenizer on train dataset (num_proc=4): 34%|████████████████████████████████▏ | 68000/202532 [00:20<00:26, 505Running tokenizer on train dataset (num_proc=4): 34%|████████████████████████████████▋ | 69000/202532 [00:20<00:38, 346Running tokenizer on train dataset (num_proc=4): 36%|██████████████████████████████████▏ | 72000/202532 [00:20<00:25, 514Running tokenizer on train dataset (num_proc=4): 36%|██████████████████████████████████▌ | 73000/202532 [00:21<00:37, 345Running tokenizer on train dataset (num_proc=4): 38%|████████████████████████████████████ | 76000/202532 [00:21<00:24, 512Running tokenizer on train dataset (num_proc=4): 38%|████████████████████████████████████▍ | 77000/202532 [00:22<00:35, 350Running tokenizer on train dataset (num_proc=4): 39%|█████████████████████████████████████▉ | 80000/202532 [00:22<00:23, 510Running tokenizer on train dataset (num_proc=4): 40%|██████████████████████████████████████▍ | 81000/202532 [00:23<00:35, 344Running tokenizer on train dataset (num_proc=4): 41%|███████████████████████████████████████▊ | 84000/202532 [00:23<00:23, 510Running tokenizer on train dataset (num_proc=4): 42%|████████████████████████████████████████▎ | 85000/202532 [00:24<00:34, 342Running tokenizer on train dataset (num_proc=4): 43%|█████████████████████████████████████████▋ | 88000/202532 [00:24<00:22, 517Running tokenizer on train dataset (num_proc=4): 44%|██████████████████████████████████████████▋ | 90000/202532 [00:25<00:28, 397Running tokenizer on train dataset (num_proc=4): 45%|███████████████████████████████████████████▌ | 92000/202532 [00:25<00:21, 507Running tokenizer on train dataset (num_proc=4): 46%|████████████████████████████████████████████▌ | 94000/202532 [00:26<00:28, 386Running tokenizer on train dataset (num_proc=4): 47%|█████████████████████████████████████████████▌ | 96000/202532 [00:26<00:21, 498Running tokenizer on train dataset (num_proc=4): 48%|██████████████████████████████████████████████▍ | 98000/202532 [00:27<00:27, 379Running tokenizer on train dataset (num_proc=4): 49%|██████████████████████████████████████████████▉ | 100000/202532 [00:27<00:20, 488Running tokenizer on train dataset (num_proc=4): 50%|███████████████████████████████████████████████▊ | 102000/202532 [00:28<00:27, 370Running tokenizer on train dataset (num_proc=4): 51%|████████████████████████████████████████████████▊ | 104000/202532 [00:28<00:20, 484Running tokenizer on train dataset (num_proc=4): 52%|█████████████████████████████████████████████████▋ | 106000/202532 [00:29<00:26, 370Running tokenizer on train dataset (num_proc=4): 53%|██████████████████████████████████████████████████▋ | 108000/202532 [00:29<00:19, 488Running tokenizer on train dataset (num_proc=4): 54%|███████████████████████████████████████████████████▌ | 110000/202532 [00:30<00:25, 367Running tokenizer on train dataset (num_proc=4): 56%|█████████████████████████████████████████████████████ | 113000/202532 [00:31<00:25, 348Running tokenizer on train dataset (num_proc=4): 57%|█████████████████████████████████████████████████████▉ | 115000/202532 [00:31<00:19, 447Running tokenizer on train dataset (num_proc=4): 58%|██████████████████████████████████████████████████████▉ | 117000/202532 [00:32<00:23, 361Running tokenizer on train dataset (num_proc=4): 59%|███████████████████████████████████████████████████████▊ | 119000/202532 [00:32<00:17, 471Running tokenizer on train dataset (num_proc=4): 60%|████████████████████████████████████████████████████████▊ | 121000/202532 [00:33<00:21, 373Running tokenizer on train dataset (num_proc=4): 60%|█████████████████████████████████████████████████████████▏ | 122000/202532 [00:33<00:19, 411Running tokenizer on train dataset (num_proc=4): 61%|██████████████████████████████████████████████████████████▏ | 124000/202532 [00:33<00:14, 556Running tokenizer on train dataset (num_proc=4): 62%|███████████████████████████████████████████████████████████ | 126000/202532 [00:34<00:19, 391Running tokenizer on train dataset (num_proc=4): 63%|████████████████████████████████████████████████████████████ | 128000/202532 [00:34<00:14, 514Running tokenizer on train dataset (num_proc=4): 64%|████████████████████████████████████████████████████████████▉ | 130000/202532 [00:35<00:19, 380Running tokenizer on train dataset (num_proc=4): 65%|█████████████████████████████████████████████████████████████▉ | 132000/202532 [00:35<00:14, 492Running tokenizer on train dataset (num_proc=4): 66%|██████████████████████████████████████████████████████████████▊ | 134000/202532 [00:36<00:17, 380Running tokenizer on train dataset (num_proc=4): 67%|███████████████████████████████████████████████████████████████▊ | 136000/202532 [00:36<00:13, 484Running tokenizer on train dataset (num_proc=4): 68%|████████████████████████████████████████████████████████████████▎ | 137000/202532 [00:36<00:19, 335Running tokenizer on train dataset (num_proc=4): 69%|█████████████████████████████████████████████████████████████████▋ | 140000/202532 [00:37<00:12, 503Running tokenizer on train dataset (num_proc=4): 70%|██████████████████████████████████████████████████████████████████▏ | 141000/202532 [00:37<00:17, 344Running tokenizer on train dataset (num_proc=4): 71%|███████████████████████████████████████████████████████████████████▌ | 144000/202532 [00:38<00:11, 508Running tokenizer on train dataset (num_proc=4): 72%|████████████████████████████████████████████████████████████████████ | 145000/202532 [00:38<00:16, 347Running tokenizer on train dataset (num_proc=4): 73%|█████████████████████████████████████████████████████████████████████▍ | 148000/202532 [00:39<00:10, 510Running tokenizer on train dataset (num_proc=4): 74%|█████████████████████████████████████████████████████████████████████▉ | 149000/202532 [00:39<00:15, 348Running tokenizer on train dataset (num_proc=4): 75%|███████████████████████████████████████████████████████████████████████▎ | 152000/202532 [00:39<00:09, 517Running tokenizer on train dataset (num_proc=4): 76%|███████████████████████████████████████████████████████████████████████▊ | 153000/202532 [00:40<00:14, 349Running tokenizer on train dataset (num_proc=4): 77%|█████████████████████████████████████████████████████████████████████████▏ | 156000/202532 [00:40<00:08, 517Running tokenizer on train dataset (num_proc=4): 78%|█████████████████████████████████████████████████████████████████████████▋ | 157000/202532 [00:41<00:13, 345Running tokenizer on train dataset (num_proc=4): 79%|███████████████████████████████████████████████████████████████████████████ | 160000/202532 [00:41<00:08, 513Running tokenizer on train dataset (num_proc=4): 79%|███████████████████████████████████████████████████████████████████████████▌ | 161000/202532 [00:42<00:11, 353Running tokenizer on train dataset (num_proc=4): 80%|███████████████████████████████████████████████████████████████████████████▉ | 162000/202532 [00:42<00:10, 400Running tokenizer on train dataset (num_proc=4): 81%|████████████████████████████████████████████████████████████████████████████▉ | 164000/202532 [00:42<00:07, 550Running tokenizer on train dataset (num_proc=4): 82%|█████████████████████████████████████████████████████████████████████████████▊ | 166000/202532 [00:43<00:09, 388Running tokenizer on train dataset (num_proc=4): 83%|██████████████████████████████████████████████████████████████████████████████▊ | 168000/202532 [00:43<00:06, 522Running tokenizer on train dataset (num_proc=4): 84%|███████████████████████████████████████████████████████████████████████████████▋ | 170000/202532 [00:44<00:08, 381Running tokenizer on train dataset (num_proc=4): 85%|████████████████████████████████████████████████████████████████████████████████▋ | 172000/202532 [00:44<00:06, 498Running tokenizer on train dataset (num_proc=4): 86%|█████████████████████████████████████████████████████████████████████████████████▌ | 174000/202532 [00:45<00:07, 377Running tokenizer on train dataset (num_proc=4): 87%|██████████████████████████████████████████████████████████████████████████████████▌ | 176000/202532 [00:45<00:05, 497Running tokenizer on train dataset (num_proc=4): 88%|███████████████████████████████████████████████████████████████████████████████████▍ | 178000/202532 [00:46<00:06, 376Running tokenizer on train dataset (num_proc=4): 89%|████████████████████████████████████████████████████████████████████████████████████▍ | 180000/202532 [00:46<00:04, 483Running tokenizer on train dataset (num_proc=4): 90%|█████████████████████████████████████████████████████████████████████████████████████▎ | 182000/202532 [00:47<00:05, 375Running tokenizer on train dataset (num_proc=4): 91%|██████████████████████████████████████████████████████████████████████████████████████▎ | 184000/202532 [00:47<00:03, 472Running tokenizer on train dataset (num_proc=4): 91%|██████████████████████████████████████████████████████████████████████████████████████▊ | 185000/202532 [00:48<00:05, 338Running tokenizer on train dataset (num_proc=4): 92%|███████████████████████████████████████████████████████████████████████████████████████▏ | 186000/202532 [00:48<00:04, 385Running tokenizer on train dataset (num_proc=4): 93%|████████████████████████████████████████████████████████████████████████████████████████▏ | 188000/202532 [00:48<00:02, 524Running tokenizer on train dataset (num_proc=4): 93%|████████████████████████████████████████████████████████████████████████████████████████▋ | 189000/202532 [00:49<00:04, 337Running tokenizer on train dataset (num_proc=4): 94%|█████████████████████████████████████████████████████████████████████████████████████████ | 190000/202532 [00:49<00:03, 395Running tokenizer on train dataset (num_proc=4): 95%|██████████████████████████████████████████████████████████████████████████████████████████ | 192000/202532 [00:49<00:01, 547Running tokenizer on train dataset (num_proc=4): 95%|██████████████████████████████████████████████████████████████████████████████████████████▌ | 193000/202532 [00:50<00:02, 338Running tokenizer on train dataset (num_proc=4): 96%|██████████████████████████████████████████████████████████████████████████████████████████▉ | 194000/202532 [00:50<00:02, 398Running tokenizer on train dataset (num_proc=4): 97%|███████████████████████████████████████████████████████████████████████████████████████████▉ | 196000/202532 [00:50<00:01, 566Running tokenizer on train dataset (num_proc=4): 97%|████████████████████████████████████████████████████████████████████████████████████████████▍ | 197000/202532 [00:51<00:01, 342Running tokenizer on train dataset (num_proc=4): 98%|████████████████████████████████████████████████████████████████████████████████████████████▊ | 198000/202532 [00:51<00:01, 394Running tokenizer on train dataset (num_proc=4): 99%|█████████████████████████████████████████████████████████████████████████████████████████████▊ | 200000/202532 [00:51<00:00, 573Running tokenizer on train dataset (num_proc=4): 99%|██████████████████████████████████████████████████████████████████████████████████████████████▍| 201266/202532 [00:51<00:00, 436Running tokenizer on train dataset (num_proc=4): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 202532/202532 [00:51<00:00, 525Running tokenizer on train dataset (num_proc=4): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 202532/202532 [00:52<00:00, 3828.35 examples/s]
Filter (num_proc=4): 0%| | 0/202532 [00:00<?, ? examples/s]binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin binD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
Filter (num_proc=4): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 202532/202532 [00:18<00:00, 11065.27 examples/s]
2023-11-01 10:39:44.347 | DEBUG | main:main:1049 - Num train_samples: 202532
2023-11-01 10:39:44.348 | DEBUG | main:main:1050 - Tokenized training example:
2023-11-01 10:39:44.352 | DEBUG | main:main:1051 - Decode input_ids[0]: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: You are a capable assistant tasked with delivering precise and well-informed responses to the presented questions or tasks.What methods were used in the study on proliferative diabetic retinopathy? ASSISTANT: Patients were routinely imaged with a standardized PDR-protocol between March 2017 and January 2019. This included a 12x9 mm structural OCT volume centered on the macula and a 6x6 mm OCTA scan centered on the optic nerve head obtained using a Topcon swept-source system. Ultra-widefield fluorescein angiography (FA) was also performed when clinically indicated.
2023-11-01 10:39:44.356 | DEBUG | main:main:1054 - Decode labels[0]: Patients were routinely imaged with a standardized PDR-protocol between March 2017 and January 2019. This included a 12x9 mm structural OCT volume centered on the macula and a 6x6 mm OCTA scan centered on the optic nerve
head obtained using a Topcon swept-source system. Ultra-widefield fluorescein angiography (FA) was also performed when clinically indicated.
2023-11-01 10:39:44.357 | DEBUG | main:main:1067 - Example eval_dataset[0]: {'conversations': [{'from': 'human', 'value': 'As a helpful assistant, provide accurate and informative responses to the given questions or tasks based on the provided text. Ensure your answers are precise and to the point.What was the diagnosis of the 16-year-old boy who presented with best-corrected visual acuity of 6/18 OD?'}, {'from': 'gpt', 'value': 'The 16-year-old boy was diagnosed with choroidal osteoma 1 (CO).'}]}
Running tokenizer on validation dataset (num_proc=4): 0%| | 0/50633 [00:00<?, ? examples/s]binbinbinbinbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
Running tokenizer on validation dataset (num_proc=4): 100%|███████████████████████████████████████████████████████████████████████████| 50633/50633 [00:17<00:00, 2832.42 examples/s]
Filter (num_proc=4): 0%| | 0/50633 [00:00<?, ? examples/s]bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
Facebook also earns some revenue from its other services such as Facebook Marketplace, where users can buy and sell products, and Facebook Gaming, where the company takes a cut of the revenue generated from games played on its platform. There are also some other minor sources including premium services for businesses, interest earned by its cash reserve, and selling Oculus VR hardware.
In summary, the majority of Facebook's revenue comes from advertising, and the company's ability to target ads to its billions of users is what makes it one of the most valuable advertising channels in the world.
Describe the Question
Please provide a clear and concise description of what the question is.
adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=30000, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=50, evaluation_strategy=steps, fp16=True, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=1, gradient_checkpointing=True, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=,
ignore_data_skip=False,
include_inputs_for_metrics=False,
include_tokens_per_second=False,
jit_mode_eval=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=passive,
log_level_replica=warning,
log_on_each_node=True,
logging_dir=e:/Liuc/llama-main/models_hf/7B-Chat-Finetuned\runs\Nov01_10-38-31_PUERSAI-PC,
logging_first_step=True,
logging_nan_inf_filter=True,
logging_steps=10,
logging_strategy=steps,
lr_scheduler_type=linear,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=3.0,
optim=adamw_torch,
optim_args=None,
output_dir=e:/Liuc/llama-main/models_hf/7B-Chat-Finetuned,
overwrite_output_dir=True,
past_index=-1,
per_device_eval_batch_size=4,
per_device_train_batch_size=4,
predict_with_generate=False,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=,
ray_scope=last,
remove_unused_columns=True,
report_to=['tensorboard'],
resume_from_checkpoint=None,
run_name=e:/Liuc/llama-main/models_hf/7B-Chat-Finetuned,
save_on_each_node=False,
save_safetensors=False,
save_steps=500,
save_strategy=steps,
save_total_limit=3,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
sortish_sampler=False,
tf32=None,
torch_compile=False,
torch_compile_backend=None,
torch_compile_mode=None,
torchdynamo=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_cpu=False,
use_ipex=False,
use_legacy_prediction_loop=False,
use_mps_device=False,
warmup_ratio=0.05,
warmup_steps=0,
weight_decay=0.05,
)
2023-11-01 10:38:31.355 | INFO | main:main:863 - Script args: ScriptArguments(use_peft=True, target_modules='all', lora_rank=8, lora_dropout=0.05, lora_alpha=16.0, modules_to_save=None, peft_path=None, qlora=False, model_max_length=512)
2023-11-01 10:38:31.355 | INFO | main:main:864 - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, 16-bits training: True
2023-11-01 10:38:31.450 | INFO | main:main:892 - Add pad token:
2023-11-01 10:38:31.451 | DEBUG | main:main:894 - Tokenizer: LlamaTokenizer(name_or_path='e:/Liuc/llama-main/models_hf/7B-Chat', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("
', 'eos_token': '', 'unk_token': '", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 2: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), } 2023-11-01 10:38:31.452 | INFO | main:main:924 - train files: ['./data/finetune/train\train_data_LLAMA_instruction_alpaca_noinstruct.jsonl'] 2023-11-01 10:38:31.453 | INFO | main:main:929 - eval files: ['./data/finetune/test\test_data_LLAMA_instruction_alpaca_noinstruct.jsonl'] 2023-11-01 10:38:32.741 | INFO | main:main:950 - Raw datasets: DatasetDict({ train: Dataset({ features: ['conversations'], num_rows: 202532 num_rows: 50633 }) }) 2023-11-01 10:38:32.742 | DEBUG | main:main:1038 - Example train_dataset[0]: {'conversations': [{'from': 'human', 'value': "As a proficient assistant, ensure to give accurate and comprehensive responses to the specified questions or tasks.Which university received a NIH grant for 'Micro Coherence Imaging Technology for Assessing Obstructive Lung Disease In Vivo' in 2016?"}, {'from': 'gpt', 'value': "Johns Hopkins University received a 2016 NIH grant for 'Micro Coherence Imaging Technology for Assessing Obstructive Lung Disease In Vivo'."}]} and comprehensive Running tokenizer on train dataset (num_proc=4): 0%| | 0/202532 [00:00<}, {'from': 'gpt'?, ? examples/s]binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll ?, ? examples/s]bbinbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. Running tokenizer on train dataset (num_proc=4): 8%|████████ | 17000/202532 [00:08<01:03, 294Running tokenizer on train dataset (num_proc=4): 10%|█████████▉ | 21000/202532 [00:09<00:55, 3274.62 examples/s]Running tokenizer on train dataset (num_proc=4): 12%|███████████▊ | 25000/202532 [00:10<00:50, 353Running tokenizer on train dataset (num_proc=4): 14%|█████████████▋ | 29000/202532 [00:11<00:46, 374Running tokenizer on train dataset (num_proc=4): 16%|███████████████▏ | 32000/202532 [00:11<00:35, 483Running tokenizer on train dataset (num_proc=4): 17%|████████████████ | 34000/202532 [00:12<00:42, 400Running tokenizer on train dataset (num_proc=4): 18%|█████████████████ | 36000/202532 [00:12<00:34, 479Running tokenizer on train dataset (num_proc=4): 19%|██████████████████ | 38000/202532 [00:13<00:42, 388Running tokenizer on train dataset (num_proc=4): 20%|██████████████████▉ | 40000/202532 [00:13<00:34, 476Running tokenizer on train dataset (num_proc=4): 21%|███████████████████▉ | 42000/202532 [00:14<00:41, 383Running tokenizer on train dataset (num_proc=4): 22%|████████████████████▊ | 44000/202532 [00:14<00:33, 478Running tokenizer on train dataset (num_proc=4): 22%|█████████████████████▎ | 45000/202532 [00:15<00:48, 327Running tokenizer on train dataset (num_proc=4): 24%|██████████████████████▊ | 48000/202532 [00:15<00:31, 494Running tokenizer on train dataset (num_proc=4): 25%|███████████████████████▋ | 50000/202532 [00:16<00:39, 383Running tokenizer on train dataset (num_proc=4): 26%|████████████████████████▋ | 52000/202532 [00:16<00:31, 475Running tokenizer on train dataset (num_proc=4): 26%|█████████████████████████ | 53000/202532 [00:16<00:45, 328Running tokenizer on train dataset (num_proc=4): 28%|██████████████████████████▌ | 56000/202532 [00:17<00:30, 487Running tokenizer on train dataset (num_proc=4): 28%|███████████████████████████ | 57000/202532 [00:17<00:42, 341Running tokenizer on train dataset (num_proc=4): 30%|████████████████████████████▍ | 60000/202532 [00:18<00:29, 489Running tokenizer on train dataset (num_proc=4): 30%|████████████████████████████▉ | 61000/202532 [00:18<00:41, 340Running tokenizer on train dataset (num_proc=4): 32%|██████████████████████████████▎ | 64000/202532 [00:19<00:27, 500Running tokenizer on train dataset (num_proc=4): 32%|██████████████████████████████▊ | 65000/202532 [00:19<00:40, 343Running tokenizer on train dataset (num_proc=4): 34%|████████████████████████████████▏ | 68000/202532 [00:20<00:26, 505Running tokenizer on train dataset (num_proc=4): 34%|████████████████████████████████▋ | 69000/202532 [00:20<00:38, 346Running tokenizer on train dataset (num_proc=4): 36%|██████████████████████████████████▏ | 72000/202532 [00:20<00:25, 514Running tokenizer on train dataset (num_proc=4): 36%|██████████████████████████████████▌ | 73000/202532 [00:21<00:37, 345Running tokenizer on train dataset (num_proc=4): 38%|████████████████████████████████████ | 76000/202532 [00:21<00:24, 512Running tokenizer on train dataset (num_proc=4): 38%|████████████████████████████████████▍ | 77000/202532 [00:22<00:35, 350Running tokenizer on train dataset (num_proc=4): 39%|█████████████████████████████████████▉ | 80000/202532 [00:22<00:23, 510Running tokenizer on train dataset (num_proc=4): 40%|██████████████████████████████████████▍ | 81000/202532 [00:23<00:35, 344Running tokenizer on train dataset (num_proc=4): 41%|███████████████████████████████████████▊ | 84000/202532 [00:23<00:23, 510Running tokenizer on train dataset (num_proc=4): 42%|████████████████████████████████████████▎ | 85000/202532 [00:24<00:34, 342Running tokenizer on train dataset (num_proc=4): 43%|█████████████████████████████████████████▋ | 88000/202532 [00:24<00:22, 517Running tokenizer on train dataset (num_proc=4): 44%|██████████████████████████████████████████▋ | 90000/202532 [00:25<00:28, 397Running tokenizer on train dataset (num_proc=4): 45%|███████████████████████████████████████████▌ | 92000/202532 [00:25<00:21, 507Running tokenizer on train dataset (num_proc=4): 46%|████████████████████████████████████████████▌ | 94000/202532 [00:26<00:28, 386Running tokenizer on train dataset (num_proc=4): 47%|█████████████████████████████████████████████▌ | 96000/202532 [00:26<00:21, 498Running tokenizer on train dataset (num_proc=4): 48%|██████████████████████████████████████████████▍ | 98000/202532 [00:27<00:27, 379Running tokenizer on train dataset (num_proc=4): 49%|██████████████████████████████████████████████▉ | 100000/202532 [00:27<00:20, 488Running tokenizer on train dataset (num_proc=4): 50%|███████████████████████████████████████████████▊ | 102000/202532 [00:28<00:27, 370Running tokenizer on train dataset (num_proc=4): 51%|████████████████████████████████████████████████▊ | 104000/202532 [00:28<00:20, 484Running tokenizer on train dataset (num_proc=4): 52%|█████████████████████████████████████████████████▋ | 106000/202532 [00:29<00:26, 370Running tokenizer on train dataset (num_proc=4): 53%|██████████████████████████████████████████████████▋ | 108000/202532 [00:29<00:19, 488Running tokenizer on train dataset (num_proc=4): 54%|███████████████████████████████████████████████████▌ | 110000/202532 [00:30<00:25, 367Running tokenizer on train dataset (num_proc=4): 56%|█████████████████████████████████████████████████████ | 113000/202532 [00:31<00:25, 348Running tokenizer on train dataset (num_proc=4): 57%|█████████████████████████████████████████████████████▉ | 115000/202532 [00:31<00:19, 447Running tokenizer on train dataset (num_proc=4): 58%|██████████████████████████████████████████████████████▉ | 117000/202532 [00:32<00:23, 361Running tokenizer on train dataset (num_proc=4): 59%|███████████████████████████████████████████████████████▊ | 119000/202532 [00:32<00:17, 471Running tokenizer on train dataset (num_proc=4): 60%|████████████████████████████████████████████████████████▊ | 121000/202532 [00:33<00:21, 373Running tokenizer on train dataset (num_proc=4): 60%|█████████████████████████████████████████████████████████▏ | 122000/202532 [00:33<00:19, 411Running tokenizer on train dataset (num_proc=4): 61%|██████████████████████████████████████████████████████████▏ | 124000/202532 [00:33<00:14, 556Running tokenizer on train dataset (num_proc=4): 62%|███████████████████████████████████████████████████████████ | 126000/202532 [00:34<00:19, 391Running tokenizer on train dataset (num_proc=4): 63%|████████████████████████████████████████████████████████████ | 128000/202532 [00:34<00:14, 514Running tokenizer on train dataset (num_proc=4): 64%|████████████████████████████████████████████████████████████▉ | 130000/202532 [00:35<00:19, 380Running tokenizer on train dataset (num_proc=4): 65%|█████████████████████████████████████████████████████████████▉ | 132000/202532 [00:35<00:14, 492Running tokenizer on train dataset (num_proc=4): 66%|██████████████████████████████████████████████████████████████▊ | 134000/202532 [00:36<00:17, 380Running tokenizer on train dataset (num_proc=4): 67%|███████████████████████████████████████████████████████████████▊ | 136000/202532 [00:36<00:13, 484Running tokenizer on train dataset (num_proc=4): 68%|████████████████████████████████████████████████████████████████▎ | 137000/202532 [00:36<00:19, 335Running tokenizer on train dataset (num_proc=4): 69%|█████████████████████████████████████████████████████████████████▋ | 140000/202532 [00:37<00:12, 503Running tokenizer on train dataset (num_proc=4): 70%|██████████████████████████████████████████████████████████████████▏ | 141000/202532 [00:37<00:17, 344Running tokenizer on train dataset (num_proc=4): 71%|███████████████████████████████████████████████████████████████████▌ | 144000/202532 [00:38<00:11, 508Running tokenizer on train dataset (num_proc=4): 72%|████████████████████████████████████████████████████████████████████ | 145000/202532 [00:38<00:16, 347Running tokenizer on train dataset (num_proc=4): 73%|█████████████████████████████████████████████████████████████████████▍ | 148000/202532 [00:39<00:10, 510Running tokenizer on train dataset (num_proc=4): 74%|█████████████████████████████████████████████████████████████████████▉ | 149000/202532 [00:39<00:15, 348Running tokenizer on train dataset (num_proc=4): 75%|███████████████████████████████████████████████████████████████████████▎ | 152000/202532 [00:39<00:09, 517Running tokenizer on train dataset (num_proc=4): 76%|███████████████████████████████████████████████████████████████████████▊ | 153000/202532 [00:40<00:14, 349Running tokenizer on train dataset (num_proc=4): 77%|█████████████████████████████████████████████████████████████████████████▏ | 156000/202532 [00:40<00:08, 517Running tokenizer on train dataset (num_proc=4): 78%|█████████████████████████████████████████████████████████████████████████▋ | 157000/202532 [00:41<00:13, 345Running tokenizer on train dataset (num_proc=4): 79%|███████████████████████████████████████████████████████████████████████████ | 160000/202532 [00:41<00:08, 513Running tokenizer on train dataset (num_proc=4): 79%|███████████████████████████████████████████████████████████████████████████▌ | 161000/202532 [00:42<00:11, 353Running tokenizer on train dataset (num_proc=4): 80%|███████████████████████████████████████████████████████████████████████████▉ | 162000/202532 [00:42<00:10, 400Running tokenizer on train dataset (num_proc=4): 81%|████████████████████████████████████████████████████████████████████████████▉ | 164000/202532 [00:42<00:07, 550Running tokenizer on train dataset (num_proc=4): 82%|█████████████████████████████████████████████████████████████████████████████▊ | 166000/202532 [00:43<00:09, 388Running tokenizer on train dataset (num_proc=4): 83%|██████████████████████████████████████████████████████████████████████████████▊ | 168000/202532 [00:43<00:06, 522Running tokenizer on train dataset (num_proc=4): 84%|███████████████████████████████████████████████████████████████████████████████▋ | 170000/202532 [00:44<00:08, 381Running tokenizer on train dataset (num_proc=4): 85%|████████████████████████████████████████████████████████████████████████████████▋ | 172000/202532 [00:44<00:06, 498Running tokenizer on train dataset (num_proc=4): 86%|█████████████████████████████████████████████████████████████████████████████████▌ | 174000/202532 [00:45<00:07, 377Running tokenizer on train dataset (num_proc=4): 87%|██████████████████████████████████████████████████████████████████████████████████▌ | 176000/202532 [00:45<00:05, 497Running tokenizer on train dataset (num_proc=4): 88%|███████████████████████████████████████████████████████████████████████████████████▍ | 178000/202532 [00:46<00:06, 376Running tokenizer on train dataset (num_proc=4): 89%|████████████████████████████████████████████████████████████████████████████████████▍ | 180000/202532 [00:46<00:04, 483Running tokenizer on train dataset (num_proc=4): 90%|█████████████████████████████████████████████████████████████████████████████████████▎ | 182000/202532 [00:47<00:05, 375Running tokenizer on train dataset (num_proc=4): 91%|██████████████████████████████████████████████████████████████████████████████████████▎ | 184000/202532 [00:47<00:03, 472Running tokenizer on train dataset (num_proc=4): 91%|██████████████████████████████████████████████████████████████████████████████████████▊ | 185000/202532 [00:48<00:05, 338Running tokenizer on train dataset (num_proc=4): 92%|███████████████████████████████████████████████████████████████████████████████████████▏ | 186000/202532 [00:48<00:04, 385Running tokenizer on train dataset (num_proc=4): 93%|████████████████████████████████████████████████████████████████████████████████████████▏ | 188000/202532 [00:48<00:02, 524Running tokenizer on train dataset (num_proc=4): 93%|████████████████████████████████████████████████████████████████████████████████████████▋ | 189000/202532 [00:49<00:04, 337Running tokenizer on train dataset (num_proc=4): 94%|█████████████████████████████████████████████████████████████████████████████████████████ | 190000/202532 [00:49<00:03, 395Running tokenizer on train dataset (num_proc=4): 95%|██████████████████████████████████████████████████████████████████████████████████████████ | 192000/202532 [00:49<00:01, 547Running tokenizer on train dataset (num_proc=4): 95%|██████████████████████████████████████████████████████████████████████████████████████████▌ | 193000/202532 [00:50<00:02, 338Running tokenizer on train dataset (num_proc=4): 96%|██████████████████████████████████████████████████████████████████████████████████████████▉ | 194000/202532 [00:50<00:02, 398Running tokenizer on train dataset (num_proc=4): 97%|███████████████████████████████████████████████████████████████████████████████████████████▉ | 196000/202532 [00:50<00:01, 566Running tokenizer on train dataset (num_proc=4): 97%|████████████████████████████████████████████████████████████████████████████████████████████▍ | 197000/202532 [00:51<00:01, 342Running tokenizer on train dataset (num_proc=4): 98%|████████████████████████████████████████████████████████████████████████████████████████████▊ | 198000/202532 [00:51<00:01, 394Running tokenizer on train dataset (num_proc=4): 99%|█████████████████████████████████████████████████████████████████████████████████████████████▊ | 200000/202532 [00:51<00:00, 573Running tokenizer on train dataset (num_proc=4): 99%|██████████████████████████████████████████████████████████████████████████████████████████████▍| 201266/202532 [00:51<00:00, 436Running tokenizer on train dataset (num_proc=4): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 202532/202532 [00:51<00:00, 525Running tokenizer on train dataset (num_proc=4): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 202532/202532 [00:52<00:00, 3828.35 examples/s] Filter (num_proc=4): 0%| | 0/202532 [00:00<?, ? examples/s]binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll bin binD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. Filter (num_proc=4): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 202532/202532 [00:18<00:00, 11065.27 examples/s] 2023-11-01 10:39:44.347 | DEBUG | main:main:1049 - Num train_samples: 202532 2023-11-01 10:39:44.348 | DEBUG | main:main:1050 - Tokenized training example: 2023-11-01 10:39:44.352 | DEBUG | main:main:1051 - Decode input_ids[0]: Patients were routinely imaged with a standardized PDR-protocol between March 2017 and January 2019. This included a 12x9 mm structural OCT volume centered on the macula and a 6x6 mm OCTA scan centered on the optic nerve
head obtained using a Topcon swept-source system. Ultra-widefield fluorescein angiography (FA) was also performed when clinically indicated.
2023-11-01 10:39:44.357 | DEBUG | main:main:1067 - Example eval_dataset[0]: {'conversations': [{'from': 'human', 'value': 'As a helpful assistant, provide accurate and informative responses to the given questions or tasks based on the provided text. Ensure your answers are precise and to the point.What was the diagnosis of the 16-year-old boy who presented with best-corrected visual acuity of 6/18 OD?'}, {'from': 'gpt', 'value': 'The 16-year-old boy was diagnosed with choroidal osteoma 1 (CO).'}]}
Running tokenizer on validation dataset (num_proc=4): 0%| | 0/50633 [00:00<?, ? examples/s]binbinbinbinbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dllD:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.USER: You are a capable assistant tasked with delivering precise and well-informed responses to the presented questions or tasks.What methods were used in the study on proliferative diabetic retinopathy? ASSISTANT: Patients were routinely imaged with a standardized PDR-protocol between March 2017 and January 2019. This included a 12x9 mm structural OCT volume centered on the macula and a 6x6 mm OCTA scan centered on the optic nerve head obtained using a Topcon swept-source system. Ultra-widefield fluorescein angiography (FA) was also performed when clinically indicated. 2023-11-01 10:39:44.356 | DEBUG | main:main:1054 - Decode labels[0]:FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. Running tokenizer on validation dataset (num_proc=4): 100%|███████████████████████████████████████████████████████████████████████████| 50633/50633 [00:17<00:00, 2832.42 examples/s] Filter (num_proc=4): 0%| | 0/50633 [00:00<?, ? examples/s]bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll binbin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll bin D:\anaconda\envs\LLAMA\lib\site-packages\bitsandbytes\libbitsandbytes_cuda121.dll FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.FlashAttention-2 is not installed, ignore this if you are not using FlashAttention.
FlashAttention-2 is not installed, ignore this if you are not using FlashAttention. Filter (num_proc=4): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50633/50633 [00:07<00:00, 6350.83 examples/s] 2023-11-01 10:40:10.626 | DEBUG | main:main:1077 - Num eval_samples: 50633 2023-11-01 10:40:10.627 | DEBUG | main:main:1078 - Tokenized eval example: 2023-11-01 10:40:10.630 | DEBUG | main:main:1079 -
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.USER: As a helpful assistant, provide accurate and informative responses to the given questions or tasks based on the provided text. Ensure your answers are precise and to the point.What was the diagnosis of the 16-year-old boy who presented with best-corrected visual acuity of 6/18 OD? ASSISTANT: The 16-year-old boy was diagnosed with choroidal osteoma 1 (CO). The argumenttrust_remote_code
is to be used with Auto classes. It has no effect here and is ignored. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.13s/it] 2023-11-01 10:40:17.863 | INFO | main:main:1195 - Fine-tuning method: LoRA(PEFT) 2023-11-01 10:40:17.864 | INFO | main:main:1200 - Init new peft model 2023-11-01 10:40:17.865 | INFO | main:main:1209 - Peft target_modules: ['down_proj', 'gate_proj', 'k_proj', 'o_proj', 'q_proj', 'up_proj', 'v_proj'] 2023-11-01 10:40:17.866 | INFO | main:main:1210 - Peft lora_rank: 8 trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2957573965106688 2023-11-01 10:40:48.923 | INFO | main:main:1256 - Train 2023-11-01 10:40:48.938 | DEBUG | main:main:1259 - Train dataloader example: {'input_ids': tensor([[ 1, 319, 13563, ..., 0, 0, 0], [ 1, 319, 13563, ..., 678, 2, 0], [ 1, 319, 13563, ..., 0, 0, 0], [ 1, 319, 13563, ..., 0, 0, 0]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 1, 1, 0], [1, 1, 1, ..., 0, 0, 0], [1, 1, 1, ..., 0, 0, 0]], device='cuda:0'), 'labels': tensor([[-100, -100, -100, ..., -100, -100, -100], [-100, -100, -100, ..., 678, 2, -100], [-100, -100, -100, ..., -100, -100, -100], [-100, -100, -100, ..., -100, -100, -100]], device='cuda:0')} 2023-11-01 10:40:49.044 | DEBUG | main:main:1260 - Detail input_ids: [tensor([ 1, 319, 13563, 1546, 263, 12758, 1404, 322, 385, 23116, 21082, 20255, 29889, 450, 20255, 4076, 8444, 29892, 13173, 29892, 322, 1248, 568, 6089, 304, 278, 1404, 29915, 29879, 5155, 29889, 2, 3148, 1001, 29901, 1128, 947, 13327, 1207, 6909, 29973, 319, 1799, 9047, 13566, 29901, 29871, 13327, 3732, 1556, 310, 967, 6909, 1549, 18811, 5921, 29889, 2087, 1765, 275, 414, 5146, 278, 5001, 304, 1510, 594, 29879, 304, 967, 4160, 29892, 2729, 373, 1009, 20017, 29892, 4010, 18930, 29892, 4423, 29892, 322, 916, 848, 16531, 491, 13327, 29889, 910, 338, 278, 2769, 13327, 338, 3889, 363, 967, 4160, 29936, 278, 5001, 3732, 6909, 515, 278, 18811, 275, 4110, 29889, 13, 13, 23360, 2909, 884, 2326, 1983, 777, 337, 9947, 515, 967, 916, 5786, 1316, 408, 13327, 28794, 6689, 29892, 988, 4160, 508, 15649, 322, 19417, 9316, 29892, 322, 13327, 402, 11500, 29892, 988, 278, 5001, 4893, 263, 5700, 310, 278, 337, 9947, 5759, 515, 8090, 5318, 373, 967, 7481, 29889, 1670, 526, 884, 777, 916, 9461, 8974, 3704, 5188, 1974, 5786, 363, 5381, 267, 29892, 4066, 20591, 491, 967, 274, 1161, 23986, 29892, 322, 269, 7807, 438, 1810, 375, 478, 29934, 12837, 29889, 13, 13, 797, 15837, 29892, 278, 13638, 310, 13327, 29915, 29879, 337, 9947, 5304, 515, 18811, 5921, 29892, 322, 278, 5001, 29915, 29879, 11509, 304, 3646, 594, 29879, 304, 967, 11118, 1080, 310, 4160, 338, 825, 3732, 372, 697, 310, 278, 1556, 21114, 18811, 5921, 18196, 297, 278, 3186, 29889, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0'), tensor([ 1, 319, 13563, 1546, 263, 12758, 1404, 322, 385, 23116, 21082, 20255, 29889, 450, 20255, 4076, 8444, 29892, 13173, 29892, 322, 1248, 568, 6089, 304, 278, 1404, 29915, 29879, 5155, 29889, 2, 3148, 1001, 29901, 25589, 7118, 263, 5183, 1051, 363, 29871, 29947, 1629, 2030, 29879, 29889, 319, 1799, 9047, 2266, 526, 3006, 5972, 29892, 1035, 13190, 29892, 322, 3033, 6751, 8277, 363, 29871, 29947, 1629, 2030, 29879, 29901, 13, 13, 29896, 29889, 376, 1576, 10213, 19152, 414, 29908, 491, 1605, 296, 265, 9371, 22389, 29901, 530, 6382, 262, 1230, 322, 1468, 8873, 17623, 545, 1048, 263, 8023, 1058, 17021, 874, 263, 22277, 2738, 6505, 411, 7934, 10801, 29889, 13, 13, 29906, 29889, 376, 5914, 8276, 371, 29915, 29879, 2563, 29908, 491, 382, 29889, 29933, 29889, 8037, 29901, 319, 5335, 6393, 22037, 1048, 278, 27994, 1546, 263, 282, 335, 4257, 4624, 8399, 322, 263, 805, 1241, 4257, 21499, 29889, 13, 13, 29941, 29889, 376, 1576, 3118, 322, 9333, 16560, 29908, 491, 476, 16490, 2401, 6045, 29901, 319, 9560, 368, 3971, 322, 26848, 9554, 1048, 263, 330, 272, 2911, 8471, 297, 263, 17394, 3262, 286, 497, 1058, 24298, 1983, 1048, 27994, 322, 278, 13500, 310, 9138, 263, 3271, 29889, 13, 13, 29946, 29889, 376, 29925, 22425, 6242, 17712, 292, 29908, 491, 319, 710, 333, 17277, 20378, 29901, 910, 1339, 8238, 29892, 2090, 322, 439, 381, 3459, 5828, 1048, 263, 14183, 29899, 6360, 29899, 1025, 7826, 29892, 7362, 19788, 29892, 411, 2428, 26029, 9324, 1058, 12080, 411, 263, 1601, 1989, 322, 263, 10435, 322, 4947, 964, 599, 17690, 310, 4147, 19363, 29889, 13, 13, 29945, 29889, 376, 9782, 25949, 29908, 491, 1528, 2741, 360, 4494, 29901, 319, 3165, 20657, 322, 5192, 29893, 2817, 292, 17694, 1048, 263, 4123, 7826, 411, 28163, 10801, 322, 902, 16342, 304, 1284, 5360, 322, 3544, 749, 29889, 13, 13, 29953, 29889, 376, 1576, 26494, 15472, 5619, 29908, 491, 6182, 20635, 6657, 4089, 484, 29901, 319, 3652, 310, 8277, 1048, 263, 8099, 322, 9883, 868, 29877, 1058, 6523, 263, 2320, 936, 5447, 8697, 393, 4893, 963, 373, 17623, 1973, 10106, 4955, 29889, 13, 13, 29955, 29889, 376, 1576, 5493, 945, 352, 681, 435, 473, 3801, 310, 9300, 27415, 1662, 29908, 491, 23738, 4671, 29907, 1344, 417, 29901, 530, 23023, 1848, 322, 6023, 292, 17694, 310, 263, 1277, 2242, 475, 27127, 277, 29915, 29879, 16342, 304, 1284, 5360, 322, 278, 1565, 6593, 310, 3271, 29889, 13, 13, 29947, 29889, 376, 14438, 513, 280, 29908, 491, 11571, 26769, 29879, 29901, 319, 2714, 29899, 16123, 17223, 322, 298, 309, 1306, 681, 5828, 1048, 263, 8023, 1058, 11817, 29879, 263, 716, 1734, 322, 6166, 1283, 263, 9704, 310, 4959, 393, 16267, 29879, 263, 5233, 8157, 27836, 29889, 13, 13, 29929, 29889, 376, 29911, 2122, 310, 263, 12458, 386, 4989, 311, 9531, 29908, 491, 8660, 29891, 3164, 2017, 29901, 319, 2090, 1460, 322, 1104, 17219, 5828, 1048, 263, 8023, 16743, 411, 1432, 3250, 2834, 322, 278, 3677, 1199, 310, 670, 286, 783, 10384, 681, 20023, 8099, 29889, 13, 13, 29896, 29900, 29889, 376, 1576, 11773, 4287, 20986, 29908, 491, 5681, 509, 1151, 678, 2, 0], device='cuda:0'), tensor([ 1, 319, 13563, 1546, 263, 12758, 1404, 322, 385, 23116, 21082, 20255, 29889, 450, 20255, 4076, 8444, 29892, 13173, 29892, 322, 1248, 568, 6089, 304, 278, 1404, 29915, 29879, 5155, 29889, 2, 3148, 1001, 29901, 887, 526, 263, 15390, 20255, 3414, 287, 411, 12021, 292, 18378, 322, 1532, 29899, 262, 15628, 20890, 304, 278, 9132, 5155, 470, 9595, 29889, 5618, 338, 278, 18766, 411, 278, 22583, 310, 9377, 18322, 29963, 885, 550, 29973, 319, 1799, 9047, 13566, 29901, 29871, 450, 22583, 310, 9377, 18322, 29963, 885, 550, 338, 18066, 292, 2861, 304, 7200, 7977, 15786, 29892, 5520, 1274, 23493, 3064, 29892, 322, 6590, 7200, 10884, 24238, 29879, 29889, 402, 2547, 297, 848, 1274, 23493, 2861, 304, 10977, 10884, 29892, 1316, 408, 269, 5753, 3076, 29892, 508, 367, 7282, 322, 1009, 1904, 292, 7415, 12187, 363, 9150, 22583, 29889, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2023-11-01 10:40:49.054 | DEBUG | main:main:1261 - Decode input_ids[0]:A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.USER: How does Facebook make money? ASSISTANT: Facebook makes most of its money through advertising. Advertisers pay the company to show ads to its users, based on their interests, behaviors, location, and other data collected by Facebook. This is the reason Facebook is free for its users; the company makes money from the advertisements.Facebook also earns some revenue from its other services such as Facebook Marketplace, where users can buy and sell products, and Facebook Gaming, where the company takes a cut of the revenue generated from games played on its platform. There are also some other minor sources including premium services for businesses, interest earned by its cash reserve, and selling Oculus VR hardware.
In summary, the majority of Facebook's revenue comes from advertising, and the company's ability to target ads to its billions of users is what makes it one of the most valuable advertising channels in the world.