Closed bravelll closed 9 months ago
这个报错是显存不够了, 你是什么机器呀
我是4张3090 24g内存的,我用codefuse-ai/CodeFuse-CodeLlama-34B-4bits模型能微调不?
这个报错是显存不够了, 你是什么机器呀
我是4张3090 24g内存的,我用codefuse-ai/CodeFuse-CodeLlama-34B-4bits模型能微调不?
codefuse-ai/CodeFuse-CodeLlama-34B-4bits 好像不支持微调. 你用lora_mp是可以跑codefuse-ai/CodeFuse-CodeLlama-34B的微调
lora_mp
lora_mp 没用过,有具体的例子吗,提供一下,谢谢!
https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen_72b_chat/lora_mp 可以参考这个例子哈 里面的shell 没看到跟哪个参数不同lora_mp 主要是哪个参数
CUDA_VISIBLE_DEVICES中gpu的个数是world_size的整数倍时, 自动开启mp or mp_ddp
mp
我试试,谢谢!
PYTHONPATH=../../.. \ CUDA_VISIBLE_DEVICES=0 \ python llm_sft.py \ --model_type codefuse-codellama-34b-chat \ --sft_type lora \ --tuner_backend swift \ --template_type codefuse-codellama \ --dtype fp16 \ --output_dir output \ --custom_train_dataset_path /u01/liuys/work/datasets/data/java-1k.jsonl \ --custom_val_dataset_path /u01/liuys/work/datasets/data/java-100.jsonl \ --train_dataset_sample -1 \ --num_train_epochs 1 \ --max_length 4096 \ --check_dataset_strategy warning \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout_p 0.05 \ --lora_target_modules DEFAULT \ --gradient_checkpointing true \ --batch_size 1 \ --weight_decay 0.01 \ --learning_rate 1e-4 \ --gradient_accumulation_steps 16 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --eval_steps 100 \ --save_steps 100 \ --save_total_limit 2 \ --logging_steps 10 \ --use_flash_attn true \ --push_to_hub false \ --hub_model_id codefuse-codellama-34b-chat-lora \ --hub_private_repo true \ --hub_token 'your-sdk-token' \ 报错如下:Traceback (most recent call last): File "/u01/liuys/swift/examples/pytorch/llm/llm_sft.py", line 7, in
output = sft_main()
File "/u01/liuys/swift/swift/llm/utils/utils.py", line 194, in x_main
return llm_x(args, *kwargs)
File "/u01/liuys/swift/swift/llm/sft.py", line 253, in llm_sft
trainer = Seq2SeqTrainer(
File "/u01/liuys/swift/swift/trainers/trainers.py", line 29, in init
super().init(args, **kwargs)
File "/u01/liuys/swift/swift/trainers/mixin.py", line 283, in init
super().init(model, args, data_collator, train_dataset,
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/trainer_seq2seq.py", line 56, in init
super().init(
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/trainer.py", line 481, in init
self._move_model_to_device(model, args.device)
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/transformers/trainer.py", line 716, in _move_model_to_device
model = model.to(device)
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
return self._apply(convert)
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 4 more times]
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/u01/liuys/anaconda3/envs/ms-swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
jsonl 数据格式如下:
{"query": "// language: Java\n// 日志一条信息", "response": "public synchronized void info(String msg){\n LogRecord record=new LogRecord(Level.INFO,msg);\n log(record);\n}"}
{"query": "// language: Java\n// 处理 gateway 接收器创建", "response": "public void handleGatewayReceiverCreate(GatewayReceiver recv) throws ManagementException {\n if (!isServiceInitialised(\"handleGatewayReceiverCreate\")) {\n return;\n }\n if (!recv.isManualStart()) {\n return;\n }\n createGatewayReceiverMBean(recv);\n}"}
{"query": "// language: Java\n// 这个方法将收到数据提供者的无论何时数据更改的通知", "response": "public void dataChanged(IDataProvider dataProvider);"}