hiyouga / LLaMA-Factory

Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
30.15k stars 3.71k forks source link

【qlora训练错误】单H100运行qlora4位训练报错。 #1710

Closed zzlgreat closed 9 months ago

zzlgreat commented 9 months ago

Reminder

Reproduction

运行参数: CUDA_VISIBLE_DEVICES=0 accelerate launch src/train_bash.py \ --stage pt \ --model_name_or_path /home/ubuntu/models/deepseek-llm-67b-base \ --do_train \ --dataset wiki_demo \ --finetuning_type lora \ --lora_target q_proj,v_proj \ --output_dir /home/ubuntu/models/deepsex-67b \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --lora_rank 256 \ --quantization_bit 4

报错信息:

12/02/2023 13:00:57 - WARNING - llmtuner.model.parser - We recommend enable upcast_layernorm in quantized training. 12/02/2023 13:00:57 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 12/02/2023 13:00:57 - WARNING - llmtuner.model.parser - ddp_find_unused_parameters needs to be set as False for LoRA in DDP training. [INFO|training_args.py:1345] 2023-12-02 13:00:57,864 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors! [INFO|training_args.py:1798] 2023-12-02 13:00:57,865 >> PyTorch: setting up devices /home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/training_args.py:1711: FutureWarning: --push_to_hub_token is deprecated and will be removed in version 5 of 🤗 Transformers. Use --hub_token instead. warnings.warn( 12/02/2023 13:00:57 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, compute dtype: None 12/02/2023 13:00:57 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=no, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/home/ubuntu/models/deepsex-67b/runs/Dec02_13-00-57_209-20-157-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=steps, lr_scheduler_type=cosine, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, output_dir=/home/ubuntu/models/deepsex-67b, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=4, predict_with_generate=False, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=[], resume_from_checkpoint=None, run_name=/home/ubuntu/models/deepsex-67b, save_on_each_node=False, save_safetensors=False, save_steps=1000, save_strategy=steps, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, sortish_sampler=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 12/02/2023 13:00:57 - INFO - llmtuner.data.loader - Loading dataset wiki_demo.txt... Using custom data configuration default-7442adfc747cef6b Loading Dataset Infos from /home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/datasets/packaged_modules/text Overwrite dataset info from restored data version if exists. Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/text/default-7442adfc747cef6b/0.0.0/c4a140d10f020282918b5dd1b8a49f0104729c6177f60a6b49ec2a365ec69f34 Found cached dataset text (/home/ubuntu/.cache/huggingface/datasets/text/default-7442adfc747cef6b/0.0.0/c4a140d10f020282918b5dd1b8a49f0104729c6177f60a6b49ec2a365ec69f34) Loading Dataset info from /home/ubuntu/.cache/huggingface/datasets/text/default-7442adfc747cef6b/0.0.0/c4a140d10f020282918b5dd1b8a49f0104729c6177f60a6b49ec2a365ec69f34 [INFO|tokenization_utils_base.py:2013] 2023-12-02 13:00:58,282 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:2013] 2023-12-02 13:00:58,282 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2013] 2023-12-02 13:00:58,282 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2013] 2023-12-02 13:00:58,282 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2013] 2023-12-02 13:00:58,282 >> loading file tokenizer_config.json [INFO|configuration_utils.py:713] 2023-12-02 13:00:58,417 >> loading configuration file /home/ubuntu/models/deepseek-llm-67b-base/config.json [INFO|configuration_utils.py:775] 2023-12-02 13:00:58,418 >> Model config LlamaConfig { "_name_or_path": "/home/ubuntu/models/deepseek-llm-67b-base", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 8192, "initializer_range": 0.02, "intermediate_size": 22016, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 64, "num_hidden_layers": 95, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.34.1", "use_cache": true, "vocab_size": 102400 }

12/02/2023 13:00:58 - INFO - llmtuner.model.loader - Quantizing model to 4 bit. [INFO|modeling_utils.py:2990] 2023-12-02 13:00:58,431 >> loading weights file /home/ubuntu/models/deepseek-llm-67b-base/pytorch_model.bin.index.json [INFO|modeling_utils.py:1220] 2023-12-02 13:00:58,431 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:770] 2023-12-02 13:00:58,431 >> Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 }

[INFO|modeling_utils.py:3103] 2023-12-02 13:00:58,606 >> Detected 4-bit loading: activating 4-bit loading for this model Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [01:38<00:00, 7.05s/it] [INFO|modeling_utils.py:3775] 2023-12-02 13:02:37,775 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:3783] 2023-12-02 13:02:37,775 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/ubuntu/models/deepseek-llm-67b-base. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:728] 2023-12-02 13:02:37,780 >> loading configuration file /home/ubuntu/models/deepseek-llm-67b-base/generation_config.json [INFO|configuration_utils.py:770] 2023-12-02 13:02:37,780 >> Generate config GenerationConfig { "bos_token_id": 100000, "eos_token_id": 100001 }

12/02/2023 13:02:38 - INFO - llmtuner.model.utils - Gradient checkpointing enabled. 12/02/2023 13:02:38 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 12/02/2023 13:02:42 - INFO - llmtuner.model.loader - trainable params: 622592000 || all params: 68047593472 || trainable%: 0.9149 Running tokenizer on dataset: 0%| | 0/200 [00:00<?, ? examples/s][WARNING|tokenization_utils_base.py:3823] 2023-12-02 13:02:42,982 >> Token indices sequence length is longer than the specified maximum sequence length for this model (8591 > 4096). Running this sequence through the model will result in indexing errors Caching processed dataset at /home/ubuntu/.cache/huggingface/datasets/text/default-7442adfc747cef6b/0.0.0/c4a140d10f020282918b5dd1b8a49f0104729c6177f60a6b49ec2a365ec69f34/cache-8d841fb3df6ae430.arrow Running tokenizer on dataset: 100%|█████████████████████████████████████████████████████████████████████████████████| 200/200 [00:01<00:00, 183.21 examples/s] input_ids: [100000, 2219, 1280, 2001, 317, 245, 7083, 17293, 285, 8308, 344, 317, 54361, 739, 280, 10801, 285, 73702, 521, 76341, 11, 64122, 489, 6813, 280, 27917, 13, 1640, 1280, 2001, 8104, 327, 254, 59621, 280, 254, 1977, 11, 588, 359, 7432, 276, 330, 20887, 11, 57028, 11, 285, 29314, 13, 1733, 245, 42948, 2116, 12, 10794, 8308, 11, 6746, 331, 254, 77727, 2116, 280, 254, 7083, 9656, 11, 359, 317, 4308, 5734, 16381, 65054, 2001, 285, 40541, 9366, 41526, 2001, 372, 254, 40541, 9366, 19736, 334, 2885, 775, 9366, 90870, 8, 280, 254, 71092, 8308, 11, 285, 643, 245, 2955, 13092, 16149, 366, 9503, 12, 42394, 2001, 285, 90870, 13, 40074, 533, 8663, 279, 36292, 1673, 8970, 19074, 475, 1234, 1323, 254, 22105, 280, 8970, 4605, 11, 73497, 11, 410, 996, 25261, 13, 2991, 254, 8307, 280, 34504, 39727, 12792, 11, 54361, 40329, 7230, 10801, 839, 8728, 13, 9495, 21252, 280, 82090, 382, 2215, 418, 1503, 5923, 4345, 11, 4959, 82090, 2001, 27433, 473, 254, 2353, 76683, 469, 13, 11399, 254, 7317, 3222, 280, 254, 207, 16, 24, 393, 285, 254, 1022, 14651, 280, 254, 207, 17, 15, 393, 8295, 11, 254, 82090, 382, 8308, 78037, 279, 1094, 4373, 280, 254, 1843, 285, 661, 245, 4485, 5012, 279, 10702, 6, 31106, 327, 81609, 13, 40140, 82090, 382, 8616, 280, 2215, 9163, 2320, 437, 3463, 13, 1640, 1280, 1934, 463, 3443, 697, 279, 2971, 82218, 11, 1094, 37192, 279, 254, 8913, 6694, 3122, 11, 254, 13011, 14671, 6368, 285, 254, 12299, 14671, 6368, 11, 4318, 1225, 10528, 254, 1225, 280, 254, 11884, 2906, 280, 82090, 2001, 13, 685, 254, 1562, 14651, 280, 254, 207, 17, 15, 393, 285, 881, 254, 207, 17, 16, 292, 8295, 11, 254, 82090, 382, 8308, 643, 803, 597, 3955, 289, 2561, 691, 13, 2219, 1280, 2001, 41085, 245, 20219, 280, 32513, 279, 1835, 276, 2984, 895, 7173, 10171, 588, 481, 330, 41635, 14801, 881, 36844, 285, 40091, 32513, 26, 745, 317, 4485, 24089, 1439, 254, 984, 11, 588, 418, 10660, 43711, 13, 76382, 32513, 8223, 276, 3792, 1224, 10801, 285, 1977, 11, 2497, 3443, 245, 18757, 1947, 279, 254, 2882, 11, 1477, 40091, 32513, 8223, 276, 3663, 3385, 856, 274, 82090, 382, 8213, 744, 330, 837, 13, 1640, 1280, 382, 2215, 11, 23510, 11, 285, 10993, 21036, 463, 7222, 245, 697, 279, 16253, 5458, 280, 3807, 8213, 13, 22025, 40329, 280, 82090, 2001, 3433, 10545, 344, 359, 317, 32329, 34418, 11, 18757, 11, 410, 5355, 80274, 13, 97459, 2342, 11, 35718, 11, 285, 6525, 429, 48194, 5195, 6947, 280, 82090, 2001, 317, 473, 254, 37836, 13669, 274, 860, 71, 480, 11, 4569, 440, 12212, 245, 40939, 955, 15323, 280, 254, 15799, 274, 12, 8462, 12212, 2465, 285, 254, 1734, 670, 17675, 378, 8462, 47982, 1, 410, 440, 81, 11920, 6419, 429, 30019, 570, 2001, 14390, 254, 80307, 1648, 344, 78159, 92459, 13, 1640, 1280, 2001, 6266, 279, 3517, 473, 207, 16, 21, 19, 17, 372, 82090, 5445, 285, 92459, 473, 207, 16, 20, 18, 24, 26, 3923, 3517, 58702, 10560, 2345, 245, 3078, 280, 19679, 13, 40140, 70368, 2383, 254, 6016, 22970, 38451, 704, 29328, 372, 82090, 1934, 11, 5802, 1860, 1108, 21574, 7480, 1313, 8208, 366, 3470, 82090, 1934, 13, 8410, 16469, 4940, 280, 254, 207, 16, 24, 393, 8295, 1108, 372, 8660, 2772, 7530, 334, 16, 22, 20, 21, 891, 16, 23, 18, 21, 8, 285, 37975, 1003, 260, 2237, 334, 16, 23, 15, 23, 891, 16, 23, 22, 16, 8, 744, 14171, 276, 254, 82090, 382, 44755, 280, 254, 2112, 8979, 548, 1216, 441, 938, 82090, 382, 410, 82090, 2001, 279, 16934, 4449, 410, 704, 25277, 13, 549, 1022, 7083, 42441, 276, 1282, 3177, 274, 82090, 382, 4811, 438, 24522, 12, 43408, 81313, 18498, 334, 16, 23, 15, 24, 891, 16, 23, 21, 20, 654, 33999, 254, 8970, 7392, 280, 82090, 2001, 279, 254, 5947, 12, 16, 24, 393, 8295, 13, 5512, 254, 207, 16, 23, 24, 15, 82, 285, 6278, 279, 7239, 11, 40541, 9366, 2001, 643, 2752, 803, 1222, 372, 245, 32046, 327, 82090, 2001, 285, 895, 938, 372, 245, 32046, 317, 1592, 3064, 4881, 254, 4794, 5110, 13, 4754, 58702, 280, 40541, 9366, 2001, 5006, 276, 3451, 3737, 2094, 12, 25649, 17293, 889, 11, 285, 2094, 12, 25649, 82090, 2001, 279, 2590, 317, 39198, 40541, 9366, 82090, 2001, 13, 7668, 254, 1639, 40541, 9366, 643, 803, 15052, 53061, 366, 82090, 2001, 11, 895, 4569, 643, 691, 5465, 68670, 366, 19738, 24842, 473, 1894, 27792, 91410, 5860, 11, 2847, 1572, 254, 2006, 21587, 285, 40541, 9366, 41526, 1934, 11, 779, 536, 441, 20582, 4449, 366, 3855, 29162, 3613, 1934, 410, 245, 353, 55014, 4709, 11, 285, 12459, 10912, 54594, 11, 779, 418, 15970, 10849, 366, 7377, 59468, 13, 18494, 11, 742, 82090, 1934, 938, 40541, 9366, 71092, 276, 4945, 82090, 2001, 6, 82, 6640, 72383, 285, 89284, 895, 10609, 366, 90870, 13, 1640, 1280, 2001, 317, 41635, 1222, 276, 7183, 254, 9503, 12, 9425, 29162, 19736, 280, 254, 71092, 8308, 13, 1640, 1280, 2001, 317, 66106, 276, 71092, 6813, 588, 418, 1977, 12, 29307, 410, 473, 2330, 13, 89093, 280, 82090, 2001, 6051, 9111, 82090, 2001, 6, 82, 71092, 24510, 285, 12157, 1011, 15291, 430, 6817, 26757, 302, 283, 475, 1439, 254, 984, 13, 4754, 30142, 7183, 82090, 2001, 372, 2497, 1313, 26352, 473, 20799, 2001, 11, 285, 1435, 1572, 54594, 285, 3613, 1934, 548, 691, 558, 11, 1477, 1094, 30142, 12766, 82090, 78, 12, 42394, 2001, 372, 245, 56297, 280, 82090, 382, 12771, 13, 7668, 18164, 276, 254, 1977, 317, 6327, 276, 82090, 382, 2215, 11, 21099, 82090, 2001, 317, 441, 274, 3244, 5266, 327, 30142, 11, 372, 745, 317, 245, 2603, 280, 8317, 3264, 30142, 285, 82090, 1934, 331, 254, 3502, 11, 285, 3947, 20408, 28725, 82090, 2001, 7860, 18417, 13, 13061, 7505, 2595, 4899, 3433, 254, 543, 327, 245, 2170, 12, 1651, 2244, 489, 8213, 11, 254, 37384, 280, 254, 1977, 33757, 11, 254, 11233, 344, 3807, 4910, 5181, 12638, 276, 2639, 279, 410, 5635, 7230, 1108, 245, 2170, 12, 1651, 2244, 489, 8213, 11, 285, 245, 16513, 331, 946, 276, 1047, 276, 21013, 254, 7173, 280, 92459, 13, 21205, 7805, 12] inputs: <|begin▁of▁sentence|>Anarchism is a political philosophy and movement that is sceptical of authority and rejects all involuntary, coercive forms of hierarchy. Anarchism calls for the abolition of the state, which it holds to be unnecessary, undesirable, and harmful. As a historically left-wing movement, placed on the farthest left of the political spectrum, it is usually described alongside communalism and libertarian Marxism as the libertarian wing (libertarian socialism) of the socialist movement, and has a strong historical association with anti-capitalism and socialism.Humans lived in societies without formal hierarchies long before the establishment of formal states, realms, or empires. With the rise of organised hierarchical bodies, scepticism toward authority also rose. Although traces of anarchist thought are found throughout history, modern anarchism emerged from the Enlightenment. During the latter half of the 19th and the first decades of the 20th century, the anarchist movement flourished in most parts of the world and had a significant role in workers' struggles for emancipation. Various anarchist schools of thought formed during this period. Anarchists have taken part in several revolutions, most notably in the Paris Commune, the Russian Civil War and the Spanish Civil War, whose end marked the end of the classical era of anarchism. In the last decades of the 20th and into the 21st century, the anarchist movement has been resurgent once more.Anarchism employs a diversity of tactics in order to meet its ideal ends which can be broadly separated into revolutionary and evolutionary tactics; there is significant overlap between the two, which are merely descriptive. Revolutionary tactics aim to bring down authority and state, having taken a violent turn in the past, while evolutionary tactics aim to prefigure what an anarchist society would be like. Anarchist thought, criticism, and praxis have played a part in diverse areas of human society. Criticism of anarchism include claims that it is internally inconsistent, violent, or utopian.Etymology, terminology, and definition The etymological origin of anarchism is from the Ancient Greek anarkhia, meaning "without a ruler", composed of the prefix an- ("without") and the word arkhos ("leader" or "ruler"). The suffix -ism denotes the ideological current that favours anarchy. Anarchism appears in English from 1642 as anarchisme and anarchy from 1539; early English usages emphasised a sense of disorder. Various factions within the French Revolution labelled their opponents as anarchists, although few such accused shared many views with later anarchists. Many revolutionaries of the 19th century such as William Godwin (1756–1836) and Wilhelm Weitling (1808–1871) would contribute to the anarchist doctrines of the next generation but did not use anarchist or anarchism in describing themselves or their beliefs.The first political philosopher to call himself an anarchist () was Pierre-Joseph Proudhon (1809–1865), marking the formal birth of anarchism in the mid-19th century. Since the 1890s and beginning in France, libertarianism has often been used as a synonym for anarchism and its use as a synonym is still common outside the United States. Some usages of libertarianism refer to individualistic free-market philosophy only, and free-market anarchism in particular is termed libertarian anarchism.While the term libertarian has been largely synonymous with anarchism, its meaning has more recently diluted with wider adoption from ideologically disparate groups, including both the New Left and libertarian Marxists, who do not associate themselves with authoritarian socialists or a vanguard party, and extreme cultural liberals, who are primarily concerned with civil liberties. Additionally, some anarchists use libertarian socialist to avoid anarchism's negative connotations and emphasise its connections with socialism. Anarchism is broadly used to describe the anti-authoritarian wing of the socialist movement. Anarchism is contrasted to socialist forms which are state-oriented or from above. Scholars of anarchism generally highlight anarchism's socialist credentials and criticise attempts at creating dichotomies between the two. Some scholars describe anarchism as having many influences from liberalism, and being both liberals and socialists but more so, while most scholars reject anarcho-capitalism as a misunderstanding of anarchist principles.While opposition to the state is central to anarchist thought, defining anarchism is not an easy task for scholars, as there is a lot of discussion among scholars and anarchists on the matter, and various currents perceive anarchism slightly differently. Major definitional elements include the will for a non-coercive society, the rejection of the state apparatus, the belief that human nature allows humans to exist in or progress toward such a non-coercive society, and a suggestion on how to act to pursue the ideal of anarchy.HistoryPre- Traceback (most recent call last): File "/home/ubuntu/LLaMA-Factory/src/train_bash.py", line 14, in main() File "/home/ubuntu/LLaMA-Factory/src/train_bash.py", line 5, in main run_exp() File "/home/ubuntu/LLaMA-Factory/src/llmtuner/train/tuner.py", line 24, in run_exp run_pt(model_args, data_args, training_args, finetuning_args, callbacks) File "/home/ubuntu/LLaMA-Factory/src/llmtuner/train/pt/workflow.py", line 41, in run_pt train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1591, in train return inner_training_loop( File "/home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1726, in _inner_training_loop model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer) File "/home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1213, in prepare result = tuple( File "/home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1214, in self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement) File "/home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1094, in _prepare_one return self.prepare_model(obj, device_placement=device_placement) File "/home/ubuntu/anaconda3/envs/llm/lib/python3.10/site-packages/accelerate/accelerator.py", line 1324, in prepare_model raise ValueError( ValueError: You can't train a model that has been loaded in 8-bit precision on a different device than the one you're training on. Make sure you loaded the model on the correct device using for example `device_map={'':torch.cuda.current_device() or device_map={'':torch.xpu.current_device()}

诡异的是,我确实是按照4位量化来运行的,模型也可以以4位精度正常加载到显卡,但却报了一个8位精度的错误

Expected behavior

No response

System Info

No response

Others

No response

hiyouga commented 9 months ago

更新代码重新试试

zzlgreat commented 9 months ago

更新代码重新试试

更新后已解决