vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.
https://docs.vectorch.com/
Apache License 2.0
316 stars 23 forks source link

fix: fix weight load issue for fused qkv and added more unittests for weight loading #213

Closed guocuimi closed 1 month ago

guocuimi commented 1 month ago

replace repeat with repeat_interleave to repeat kv weights along kv_head dim for scenarios n_kv_heads < world_size