[Core] core on the chatglm3 model using scalellm.

Model: THUDM/chatglm3-6b

Reproduce: python3 run_data.py --input_file /data/dataset/Chatbot_group_10_2.json --model_dir=/data/chatglm3-6b --batch_size=1

Run chatglm3 model using scalellm core.

Output: I20240605 05:09:23.595679 713 llm_engine.cpp:166] Initializing model with ModelArgs: [model_type: chatglm, dtype: float16, hidden_size: 4096, hidden_act: , intermediate_size: 13696, n_layers: 28, head_dim: 128, n_heads: 32, n_kv_heads: 2, vocab_size: 65024, rms_norm_eps: 0, layer_norm_eps: 1e-05, rotary_dim: 128, rope_theta: 10000, rope_scaling: 0, rotary_pct: 0.5, max_position_embeddings: 8192, bos_token_id: 0, eos_token_id: 2, use_parallel_residual: 0, attn_qkv_clip: 0, attn_qk_ln: 0, attn_alibi: 0, alibi_bias_max: 0, no_bias: 0, linear_bias: 0, qkv_bias: 1, residual_post_layernorm: 0] I20240605 05:09:23.595701 713 llm_engine.cpp:167] Initializing model with quant args: QuantArgs: [quant_method: , bits: 0, group_size: 0, desc_act: 0, true_sequential: 0] I20240605 05:09:23.595710 713 llm_engine.cpp:168] Initializing model with tokenizer args: TokenizerArgs: [tokenizer_type: sentencepiece, vocab_file: tokenizer.model, special_tokens: [([MASK], 64789) ([gMASK], 64790) ([sMASK], 64791) (sop, 64792) (eop, 64793) (<|system|>, 64794) (<|user|>, 64795) (<|assistant|>, 64796) (<|observation|>, 64797) ], pattern: , prefix_tokens: [[gMASK], sop]] Floating point exception (core dumped)

vectorch-ai / ScaleLLM

[Core] core on the chatglm3 model using scalellm. #221