kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Apache License 2.0
729 stars 37 forks source link

Would you support glm4-chat-1m #65

Open choyakawa opened 2 months ago

choyakawa commented 2 months ago

I have some concerns about this. Based on my experience, GGUF with llama.cpp seems to work differently from transformers, whereas GGML with chatglm.cpp behaves the same as transformers. I haven't yet identified the exact differences. Therefore, an optimization for long-context handling with transformers would be very helpful.

qiyuxinlin commented 2 months ago

Thank you for following our work! we will take some time to evaluate how to incorporate glm4-chat-1m into our framework