Running tokenizer on train dataset: 0%| | 0/1000 [00:00<?, ? examples/s]
Traceback (most recent call last):
File "/MedicalGPT/supervised_finetuning.py", line 1383, in
main()
File "/MedicalGPT/supervised_finetuning.py", line 1094, in main
train_dataset = train_dataset.shuffle().map(
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs)
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs)
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3093, in map
for rank, done, content in Dataset._map_single(dataset_kwargs):
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3470, in _map_single
batch = apply_function_on_filtered_inputs(
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3349, in apply_function_on_filtered_inputs
processed_inputs = function(fn_args, additional_args, **fn_kwargs)
File "/MedicalGPT/supervised_finetuning.py", line 1043, in preprocess_function
for dialog in get_dialog(examples):
File "/MedicalGPT/supervised_finetuning.py", line 1020, in get_dialog
for i, source in enumerate(examples['conversation']):
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/formatting/formatting.py", line 270, in getitem
value = self.data[key]
KeyError: 'conversation'
请问如何修复?
!python supervised_finetuning.py \ --model_type llama \ --model_name_or_path ./merged-pt \ --train_file_dir ./data/finetune \ --validation_file_dir ./data/finetune \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --do_train \ --do_eval \ --use_peft True \ --fp16 \ --max_train_samples 1000 \ --max_eval_samples 10 \ --num_train_epochs 1 \ --learning_rate 2e-5 \ --warmup_ratio 0.05 \ --weight_decay 0.05 \ --logging_strategy steps \ --logging_steps 10 \ --eval_steps 50 \ --evaluation_strategy steps \ --save_steps 500 \ --save_strategy steps \ --save_total_limit 3 \ --gradient_accumulation_steps 1 \ --preprocessing_num_workers 1 \ --output_dir outputs-sft-v1 \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --target_modules all \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0.05 \ --torch_dtype float16 \ --device_map auto \ --report_to tensorboard \ --ddp_find_unused_parameters False \ --gradient_checkpointing True
Running tokenizer on train dataset: 0%| | 0/1000 [00:00<?, ? examples/s] Traceback (most recent call last): File "/MedicalGPT/supervised_finetuning.py", line 1383, in
main()
File "/MedicalGPT/supervised_finetuning.py", line 1094, in main
train_dataset = train_dataset.shuffle().map(
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 592, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, *kwargs)
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 557, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, args, kwargs)
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3093, in map
for rank, done, content in Dataset._map_single(dataset_kwargs):
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3470, in _map_single
batch = apply_function_on_filtered_inputs(
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 3349, in apply_function_on_filtered_inputs
processed_inputs = function(fn_args, additional_args, **fn_kwargs)
File "/MedicalGPT/supervised_finetuning.py", line 1043, in preprocess_function
for dialog in get_dialog(examples):
File "/MedicalGPT/supervised_finetuning.py", line 1020, in get_dialog
for i, source in enumerate(examples['conversation']):
File "/anaconda3/envs/medicalgpt/lib/python3.9/site-packages/datasets/formatting/formatting.py", line 270, in getitem
value = self.data[key]
KeyError: 'conversation'
请问如何修复?