Closed laichen2021 closed 11 months ago
可以参考一下这里的环境准备哈
微调跟模型有关吗? qwen_7b_chat能跑通,Qwen-7B-Chat-Int4跑不通。
你试试拉取一下最新的代码
你试试拉取一下最新的代码
我看新老代码是不兼容,我要固定一个版本,不能跟随变化
你说的不兼容是什么呢, 可以详细说说嘛
可以描述下不兼容的问题吗,我这里测试,老版本的checkpoint在新版本中是可以加载的
可以尝试pip uninstall ms-swift, 之后 pip install .试一下
$CUDA_VISIBLE_DEVICES=0 swift sft --model_id_or_path qwen/Qwen-7B-Chat-Int4 --dataset blossom-math-zh 2023-12-05 10:33:07,915 - modelscope - INFO - PyTorch version 2.0.0+cu117 Found. 2023-12-05 10:33:07,916 - modelscope - INFO - Loading ast index from /home/admin/workspace/aop_lab/app_data/.cache/ast_indexer 2023-12-05 10:33:08,028 - modelscope - INFO - Loading done! Current index file version is 1.9.5, with md5 02df781eb58661cd6278e5eb420ca5f2 and a total number of 945 components indexed run sh:
python /home/admin/miniconda3/lib/python3.10/site-packages/swift/cli/sft.py --model_id_or_path qwen/Qwen-7B-Chat-Int4 --dataset blossom-math-zh
2023-12-05 10:33:13,083 - modelscope - INFO - PyTorch version 2.0.0+cu117 Found. 2023-12-05 10:33:13,085 - modelscope - INFO - Loading ast index from /home/admin/workspace/aop_lab/app_data/.cache/ast_indexer 2023-12-05 10:33:13,399 - modelscope - INFO - Loading done! Current index file version is 1.9.5, with md5 02df781eb58661cd6278e5eb420ca5f2 and a total number of 945 components indexed [INFO:swift] output_dir: /home/admin/workspace/aop_lab/app_source/output/qwen-7b-chat-int4/v0-20231205-103314 [INFO:swift] Setting template_type: chatml [INFO:swift] Setting hub_model_id: qwen-7b-chat-int4-lora [INFO:swift] args: SftArguments(model_type='qwen-7b-chat-int4', model_id_or_path='qwen/Qwen-7B-Chat-Int4', model_revision='master', model_cache_dir=None, sft_type='lora', tuner_backend='swift', template_type='chatml', output_dir='/home/admin/workspace/aop_lab/app_source/output/qwen-7b-chat-int4/v0-20231205-103314', add_output_dir_suffix=True, ddp_backend='nccl', seed=42, resume_from_checkpoint=None, dtype='fp16', dataset=['blossom-math-zh'], dataset_seed=42, dataset_test_ratio=0.01, train_dataset_sample=20000, system='you are a helpful assistant!', max_length=2048, check_dataset_strategy='none', custom_train_dataset_path=None, custom_val_dataset_path=None, quantization_bit=0, bnb_4bit_comp_dtype='fp16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, lora_target_modules=['c_attn'], lora_rank=8, lora_alpha=32, lora_dropout_p=0.05, gradient_checkpointing=True, deepspeed_config_path=None, batch_size=1, eval_batch_size=1, num_train_epochs=1, max_steps=-1, optim='adamw_torch', learning_rate=0.0001, weight_decay=0.01, gradient_accumulation_steps=16, max_grad_norm=1.0, predict_with_generate=False, lr_scheduler_type='cosine', warmup_ratio=0.05, eval_steps=50, save_steps=50, only_save_model=False, save_total_limit=2, logging_steps=5, dataloader_num_workers=1, push_to_hub=False, hub_model_id='qwen-7b-chat-int4-lora', hub_private_repo=True, push_hub_strategy='push_best', hub_token=None, test_oom_error=False, use_flash_attn=None, ignore_args_error=False, logging_dir='/home/admin/workspace/aop_lab/app_source/output/qwen-7b-chat-int4/v0-20231205-103314/runs', report_to=None, max_new_tokens=2048, do_sample=True, temperature=0.9, top_k=20, top_p=0.9, repetition_penalty=1.05) device_count: 1 rank: -1, local_rank: -1, world_size: 1, local_world_size: 1 [INFO:swift] Global seed set to 42 [WARNING:modelscope] Using the master branch is fragile, please use it with caution! [INFO:modelscope] Use user-specified model revision: master [INFO:swift] use gptq, ignore bnb arguments Usingdisable_exllama
is deprecated and will be removed in version 4.37. Useuse_exllama
instead and specify the version withexllama_config
.The value ofuse_exllama
will be overwritten bydisable_exllama
passed inGPTQConfig
or stored in your config file. CUDA extension not installed. CUDA extension not installed. [INFO:swift] model_config: QWenConfig { "_name_or_path": "/home/admin/workspace/aop_lab/app_data/.cache/qwen/Qwen-7B-Chat-Int4", "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "configuration_qwen.QWenConfig", "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": true, "fp32": false, "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 22016, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 8192, "model_type": "qwen", "no_bias": true, "num_attention_heads": 32, "num_hidden_layers": 32, "onnx_safe": null, "quantization_config": { "bits": 4, "damp_percent": 0.01, "desc_act": false, "group_size": 128, "model_file_base_name": "model", "model_name_or_path": null, "quant_method": "gptq", "static_groups": false, "sym": true, "true_sequential": true }, "rotary_emb_base": 10000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 8192, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "torch_dtype": "float16", "transformers_version": "4.35.2", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": true, "use_flash_attn": "auto", "use_logn_attn": true, "vocab_size": 151936 }You passed
sft_main()
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/llm/utils/utils.py", line 194, in x_main
return llm_x(args, kwargs)
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/llm/sft.py", line 77, in llm_sft
model = Swift.prepare_model(model, lora_config)
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/tuners/base.py", line 495, in prepare_model
return SwiftModel(model, config, kwargs)
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/tuners/base.py", line 60, in init
self.adapters[DEFAULT_ADAPTER] = self._prepare_model(
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/tuners/base.py", line 299, in _prepare_model
return SWIFT_MAPPING[config.swift_type][1].prepare_model(
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/tuners/lora.py", line 226, in prepare_model
LoRA._dynamic_patch_lora(
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/tuners/lora.py", line 331, in _dynamic_patch_lora
lora_module = QuantLinear(
File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/tuners/lora.py", line 107, in init
self.active_adapter = adapter_name
File "/home/admin/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1674, in setattr
super().setattr(name, value)
AttributeError: can't set attribute 'active_adapter'
(base)
quantization_config
tofrom_pretrained
but the model you're loading already has aquantization_config
attribute and has already quantized weights. However, loading attributes (e.g. use_exllama, exllama_config, use_cuda_fp16, max_input_length) will be overwritten with the one you passed tofrom_pretrained
. The rest will be ignored. Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00, 1.36s/it] Traceback (most recent call last): File "/home/admin/miniconda3/lib/python3.10/site-packages/swift/cli/sft.py", line 4, in