Added parallel code for chatglm-6B

Added parallel code for chatglm-6B. Due to the small number of parameters, the inference speed is not as fast as single card loading, but it can be referenced in GLM models with larger parameter quantities for inference.

Split the mixed qkv vectors in chatglm on the huggingface into multiple heads, then take out the qkv of each head, and finally concatenate them into a whole qkv
Write the layer definition of chatglm into init, and rebuild the forward function according to the basic layer in Colossalai

hpcaitech / EnergonAI

Added parallel code for chatglm-6B #225