Closed MisakaMikoto-o closed 3 months ago
量化的模型来自于哪里?
量化的模型来自于哪里?
我也报错了
$ E:/ygf/swift/uav/GLM4V_sft.sh
run sh: python C:\Users\jxny02\anaconda3\envs\ygf_swift\Lib\site-packages\swift \cli\sft.py --model_type glm4v-9b-chat --model_id_or_path E:/ygf/swift/GLM-4V-9B -chat --sft_type lora --tuner_backend peft --template_type AUTO --dtype AUTO --o utput_dir E:/ygf/swift/output --dataset E:/ygf/swift/uav/Yi_1v1.json --train_dat aset_sample -1 --num_train_epochs 1 --max_length 2048 --check_dataset_strategy w arning --lora_rank 8 --lora_alpha 32 --lora_dropout_p 0.05 --lora_target_modules DEFAULT --gradient_checkpointing true --batch_size 1 --weight_decay 0.1 --learn ing_rate 1e-4 --gradient_accumulation_steps 16 --max_grad_norm 0.5 --warmup_rati o 0.03 --eval_steps 100 --save_steps 100 --save_total_limit 2 --logging_steps 10 0 --use_flash_attn false
[INFO:swift] Successfully registered C:\Users\jxny02\anaconda3\envs\ygf_swift\L ib\site-packages\swift\llm\data\dataset_info.json
[INFO:swift] Start time of running main: 2024-07-22 20:11:46.089474
[INFO:swift] Setting template_type: glm4v
[INFO:swift] Setting args.lazy_tokenize: True
[INFO:swift] Setting args.dataloader_num_workers: 0
[INFO:swift] output_dir: E:\ygf\swift\output\glm4v-9b-chat\v0-20240722-201146
[INFO:swift] args: SftArguments(model_type='glm4v-9b-chat', model_id_or_path='E:
\ygf\swift\GLM-4V-9B-chat', model_revision='master', sfttype='lora', freeze
parameters=0.0, additional_trainable_parameters=[], tuner_backend='peft', templa
te_type='glm4v', output_dir='E:\ygf\swift\output\glm4v-9b-chat\v0-20240722-
201146', add_output_dir_suffix=True, ddp_backend=None, ddp_find_unused_parameter
s=None, ddp_broadcast_buffers=None, seed=42, resume_from_checkpoint=None, resume
_only_model=False, ignore_data_skip=False, dtype='bf16', packing=False, dataset=
['E:/ygf/swift/uav/Yi_1v1.json'], val_dataset=[], dataset_seed=42, datasettest
ratio=0.01, use_loss_scale=False, loss_scale_config_path='C:\Users\jxny02\ana
conda3\envs\ygf_swift\Lib\site-packages\swift\llm\agent\default_loss_sca
le_config.json', system=None, tools_prompt='react_en', max_length=2048, truncati
on_strategy='delete', check_dataset_strategy='warning', model_name=[None, None],
model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, h
qq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf
4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, lora_target_mod
ules=['self_attention.query_key_value'], lora_rank=8, lora_alpha=32, lora_dropou
t_p=0.05, lora_bias_trainable='none', lora_modules_to_save=[], lora_dtype='AUTO'
, lora_lr_ratio=None, use_rslora=False, use_dora=False, init_lora_weights='true'
, rope_scaling=None, boft_block_size=4, boft_block_num=0, boft_n_butterfly_facto
r=1, boft_target_modules=['DEFAULT'], boft_dropout=0.0, boft_modules_to_save=[],
vera_rank=256, vera_target_modules=['DEFAULT'], vera_projection_prng_key=0, ver
a_dropout=0.0, vera_d_initial=0.1, vera_modules_to_save=[], adapter_act='gelu',
adapter_length=128, use_galore=False, galore_rank=128, galore_target_modules=Non
e, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_o
ptim_per_parameter=False, galore_with_embedding=False, adalora_target_r=8, adalo
ra_init_r=12, adalora_tinit=0, adalora_tfinal=0, adalora_deltaT=1, adalora_beta1
=0.85, adalora_beta2=0.85, adalora_orth_reg_weight=0.5, ia3_target_modules=['DEF
AULT'], ia3_feedforward_modules=[], ia3_modules_to_save=[], llamapro_num_new_blo
cks=4, llamapro_num_groups=None, neftune_noise_alpha=None, neftune_backend='tran
sformers', lisa_activated_layers=0, lisa_step_interval=20, gradient_checkpointin
g=True, deepspeed=None, batch_size=1, eval_batch_size=1, num_train_epochs=1, max
_steps=-1, optim='adamw_torch', adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1
e-08, learning_rate=0.0001, weight_decay=0.1, gradient_accumulation_steps=16, ma
x_grad_norm=0.5, predict_with_generate=False, lr_scheduler_type='linear', warmup
_ratio=0.03, eval_steps=100, save_steps=100, save_only_model=False, save_total_l
imit=2, logging_steps=100, dataloader_num_workers=0, dataloader_pin_memory=True,
dataloader_drop_last=False, push_to_hub=False, hub_model_id=None, hub_token=Non
e, hub_private_repo=False, push_hub_strategy='push_best', test_oom_error=False,
disable_tqdm=False, lazy_tokenize=True, preprocess_num_proc=1, use_flash_attn=Fa
lse, ignore_args_error=False, check_model_is_latest=True, logging_dir='E:\ygf\
swift\output\glm4v-9b-chat\v0-20240722-201146/runs', report_to=['tensorboard'
], acc_strategy='token', save_on_each_node=True, evaluation_strategy='steps', sa
ve_strategy='steps', save_safetensors=True, gpu_memory_fraction=None, include_nu
m_input_tokens_seen=False, local_repo_path=None, custom_register_path=None, cust
om_dataset_info=None, device_map_config_path=None, max_new_tokens=2048, do_sampl
e=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=
1, fsdp='', fsdp_config=None, sequence_parallel_size=1, model_layer_cls_name=Non
e, metric_warmup_step=0, fsdp_num=1, per_device_train_batch_size=None, per_devic
e_eval_batch_size=None, eval_strategy=None, self_cognition_sample=0, train_datas
et_mix_ratio=0.0, train_dataset_mix_ds=['ms-bench'], train_dataset_sample=-1, va
l_dataset_sample=None, safe_serialization=None, only_save_model=None, neftune_al
pha=None, deepspeed_config_path=None, model_cache_dir=None, custom_train_dataset
_path=[], custom_val_dataset_path=[])
[INFO:swift] Global seed set to 42
device_count: 1
rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Loading the model using model_dir: E:\ygf\swift\GLM-4V-9B-chat
Special tokens have been added in the vocabulary, make sure the associated word
embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████| 15/15 [02:34<00:00, 10.33s/it]
[INFO:swift] model.max_model_len: 8192
[INFO:swift] model_config: ChatGLMConfig {
"_name_or_path": "E:\ygf\swift\GLM-4V-9B-chat",
"add_bias_linear": false,
"add_qkv_bias": true,
"apply_query_key_layer_scaling": true,
"apply_residual_connection_post_layernorm": false,
"architectures": [
"ChatGLMModel"
],
"attention_dropout": 0.0,
"attention_softmax_in_fp32": true,
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForCausalLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSequenceClassification": "modeling_chatglm.ChatGLMForSequenceCl
assification"
},
"bias_dropout_fusion": true,
"boi_token_id": 151339,
"classifier_dropout": null,
"eoi_token_id": 151340,
"eos_token_id": [
151329,
151336,
151338
],
"ffn_hidden_size": 13696,
"fp32_residual_connection": false,
"hidden_dropout": 0.0,
"hidden_size": 4096,
"kv_channels": 128,
"layernorm_epsilon": 1.5625e-07,
"model_type": "chatglm",
"multi_query_attention": true,
"multi_query_group_num": 2,
"num_attention_heads": 32,
"num_layers": 40,
"original_rope": true,
"pad_token_id": 151329,
"padded_vocab_size": 151552,
"post_layer_norm": true,
"pre_seq_len": null,
"prefix_projection": false,
"rmsnorm": true,
"rope_ratio": 1,
"seq_length": 8192,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.41.2",
"use_cache": true,
"vision_config": {
"dropout_prob": 0.0,
"hidden_act": "gelu",
"hidden_size": 1792,
"image_size": 1120,
"in_channels": 3,
"intermediate_size": 15360,
"layer_norm_eps": 1e-06,
"num_heads": 16,
"num_hidden_layers": 63,
"num_positions": 6401,
"patch_size": 14,
"scaling_factor": 8
},
"vocab_size": 151552
}
[INFO:swift] generation_config: GenerationConfig { "do_sample": true, "eos_token_id": 151329, "max_new_tokens": 2048, "pad_token_id": 151329, "temperature": 0.3, "top_k": 20, "top_p": 0.7 }
[INFO:swift] lora_target_modules: ['self_attention.query_key_value']
[INFO:swift] lora_modules_to_save: []
[INFO:swift] lora_config: get_wrapped_class.
quires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.input_layernorm.weig
ht]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.self_attention.query
_key_value.base_layer.weight]: requires_grad=False, dtype=torch.bfloat16, device
=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.self_attention.query
_key_value.base_layer.bias]: requires_grad=False, dtype=torch.bfloat16, device=c
uda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.self_attention.query
_key_value.lora_A.default.weight]: requires_grad=True, dtype=torch.bfloat16, dev
ice=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.self_attention.query
_key_value.lora_B.default.weight]: requires_grad=True, dtype=torch.bfloat16, dev
ice=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.self_attention.dense
.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.post_attention_layer
norm.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.we
ight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.we
ight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.input_layernorm.weig
ht]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.self_attention.query
_key_value.base_layer.weight]: requires_grad=False, dtype=torch.bfloat16, device
=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.self_attention.query
_key_value.base_layer.bias]: requires_grad=False, dtype=torch.bfloat16, device=c
uda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.self_attention.query
_key_value.lora_A.default.weight]: requires_grad=True, dtype=torch.bfloat16, dev
ice=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.self_attention.query
_key_value.lora_B.default.weight]: requires_grad=True, dtype=torch.bfloat16, dev
ice=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.self_attention.dense
.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.post_attention_layer
norm.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.we
ight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.we
ight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] [base_model.model.transformer.encoder.layers.2.input_layernorm.weig
ht]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0
[INFO:swift] ...
[INFO:swift] PeftModelForCausalLM(
(base_model): LoraModel(
(model): ChatGLMForConditionalGeneration(
(transformer): ChatGLMModel(
(embedding): Embedding(
(word_embeddings): Embedding(151552, 4096)
)
(rotary_pos_emb): RotaryEmbedding()
(encoder): GLMTransformer(
(layers): ModuleList(
(0-39): 40 x GLMBlock(
(input_layernorm): RMSNorm()
(self_attention): SelfAttention(
(query_key_value): lora.Linear(
(base_layer): Linear(in_features=4096, out_features=4608, bias
=True)
(lora_dropout): ModuleDict(
(default): Dropout(p=0.05, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=4096, out_features=8, bias=Fal
se)
)
(lora_B): ModuleDict(
(default): Linear(in_features=8, out_features=4608, bias=Fal
se)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(core_attention): CoreAttention(
(attention_dropout): Dropout(p=0.0, inplace=False)
)
(dense): Linear(in_features=4096, out_features=4096, bias=False)
)
(post_attention_layernorm): RMSNorm()
(mlp): MLP(
(dense_h_to_4h): Linear(in_features=4096, out_features=27392, bi
as=False)
(dense_4h_to_h): Linear(in_features=13696, out_features=4096, bi
as=False)
)
)
)
(final_layernorm): RMSNorm()
)
(output_layer): Linear(in_features=4096, out_features=151552, bias=False
)
(vision): EVA2CLIPModel(
(patch_embedding): PatchEmbedding(
(proj): Conv2d(3, 1792, kernel_size=(14, 14), stride=(14, 14))
(position_embedding): Embedding(6401, 1792)
)
(transformer): Transformer(
(layers): ModuleList(
(0-62): 63 x TransformerLayer(
(input_layernorm): LayerNorm((1792,), eps=1e-06, elementwise_aff
ine=True)
(attention): Attention(
(query_key_value): Linear(in_features=1792, out_features=5376,
bias=True)
(dense): Linear(in_features=1792, out_features=1792, bias=True
)
(output_dropout): Dropout(p=0.0, inplace=False)
)
(mlp): MLP(
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1792, out_features=15360, bias=True)
(fc2): Linear(in_features=15360, out_features=1792, bias=True)
)
(post_attention_layernorm): LayerNorm((1792,), eps=1e-06, elemen
twise_affine=True)
)
)
)
(linear_proj): GLU(
(linear_proj): Linear(in_features=4096, out_features=4096, bias=Fals
e)
(norm1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
(act1): GELU(approximate='none')
(dense_h_to_4h): Linear(in_features=4096, out_features=13696, bias=F
alse)
(gate_proj): Linear(in_features=4096, out_features=13696, bias=False
)
(dense_4h_to_h): Linear(in_features=13696, out_features=4096, bias=F
alse)
)
(conv): Conv2d(1792, 4096, kernel_size=(2, 2), stride=(2, 2))
)
)
)
)
)
[INFO:swift] PeftModelForCausalLM: 13909.1062M Params (2.7853M Trainable [0.0200
%]), 0.0000M Buffers.
[INFO:swift] Setting model.config.use_cache: False
[INFO:swift] check dataset...
[INFO:swift] check_dataset_strategy: 'warning'
100%|█████████████████████████████████| 69182/69182 [00:06<00:00, 10917.65it/s]
100%|█████████████████████████████████████| 698/698 [00:00<00:00, 11636.12it/s]
[INFO:swift] train_dataset: Dataset({
features: ['query', 'response', 'images'],
num_rows: 69182
})
[INFO:swift] val_dataset: Dataset({
features: ['query', 'response', 'images'],
num_rows: 698
})
[INFO:swift] system: None
[INFO:swift] args.lazy_tokenize: True
[INFO:swift] [INPUT_IDS] [151331, 151333, 151336, 198, 151339, 151329, 151340, 7
85, 274, 23121, 3867, 1992, 389, 6702, 220, 17, 11, 220, 115937, 16, 11, 323, 43
2, 702, 1012, 220, 120392, 2849, 2474, 279, 274, 23121, 2400, 13, 3555, 374, 279
, 1482, 6008, 76039, 320, 2916, 8, 897, 315, 419, 8044, 315, 32896, 30, 151337,
1986, 32896, 8044, 594, 1482, 6008, 76039, 320, 2916, 8, 897, 374, 902, 76039, 1
3, 151329]
[INFO:swift] [INPUT] [gMASK]
Train: 0%| | 0/4323 [00:02<?, ?it/s]
使用最新的ms-swift和最新的glm4v的.py文件
CUDA_VISIBLE_DEVICES=0 swift sft --model_type glm4v-9b-chat --model_id_or_path /content/glm-4v-9b-4-bits --dataset /content/drive/MyDrive/glm/training_data.jsonl --output_dir /content/drive/MyDrive/glm/output
使用上面命令微调量化后的glm-4v-9b模型时报错:推理没有问题