However, during the execution, I encounter the following traceback:
WARNING:root:Loading data...
WARNING:root:Formatting inputs...
Traceback (most recent call last):
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 436, in
train()
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 422, in train
data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 383, in make_supervised_data_module
train_dataset = dataset_cls(
^^^^^^^^^^^^
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 301, in init
data_dict = preprocess(sources, tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 234, in preprocess
header = f"{default_conversation.system}\n\n"
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Conversation' object has no attribute 'system'
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2294534 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2294533) of binary: {my_home}/anaconda3/envs/fast-chat/bin/python
Traceback (most recent call last):
File "{my_home}/anaconda3/envs/fast-chat/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
The training process should execute without raising any AttributeError.
Actual Behavior:
The training halts due to the AttributeError related to the missing 'system' attribute in the Conversation object.
Additional Context:
This is my first attempt to train FastChat T5 on my local machine, and I followed the setup instructions as provided in the documentation. It's important to note that I have not made any modifications to any files and am just attempting to run the code to see if it can execute successfully.
I'm attempting to fine-tuning FastChat T5 locally using the command:
torchrun --nproc_per_node=1 --master_port=9778 fastchat/train/train_flant5.py \ --model_name_or_path {my_path}/test_fastchat/fastchat-t5-3b-v1.0 \ --data_path ./data/dummy_conversation.json \ --bf16 True \ --output_dir ./checkpoints_flant5_3b \ --num_train_epochs 3 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 300 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --fsdp "full_shard auto_wrap" \ --fsdp_transformer_layer_cls_to_wrap T5Block \ --tf32 True \ --model_max_length 2048 \ --preprocessed_path ./preprocessed_data/processed.json \ --gradient_checkpointing True
However, during the execution, I encounter the following traceback:
WARNING:root:Loading data... WARNING:root:Formatting inputs... Traceback (most recent call last): File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 436, in
train()
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 422, in train
data_module = make_supervised_data_module(tokenizer=tokenizer, data_args=data_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 383, in make_supervised_data_module
train_dataset = dataset_cls(
^^^^^^^^^^^^
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 301, in init
data_dict = preprocess(sources, tokenizer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "{my_path}/test_fastchat/FastChat/fastchat/train/train_flant5.py", line 234, in preprocess
header = f"{default_conversation.system}\n\n"
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Conversation' object has no attribute 'system'
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2294534 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2294533) of binary: {my_home}/anaconda3/envs/fast-chat/bin/python
Traceback (most recent call last):
File "{my_home}/anaconda3/envs/fast-chat/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "{my_home}/anaconda3/envs/fast-chat/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
Environment:
OS: Ubuntu 22.04.2 LTS Device: NVIDIA RTX A6000 PyTorch version: 2.0.1 CUDA version: 11.7
Expected Behavior:
The training process should execute without raising any AttributeError.
Actual Behavior:
The training halts due to the AttributeError related to the missing 'system' attribute in the Conversation object.
Additional Context:
This is my first attempt to train FastChat T5 on my local machine, and I followed the setup instructions as provided in the documentation. It's important to note that I have not made any modifications to any files and am just attempting to run the code to see if it can execute successfully.
How should I go about resolving this issue?