DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Apache License 2.0
130
stars
14
forks
source link
fix: change to size_t to avoid overflow when seq is long #11
When max_engine_len exceed 2^14 = 16384, the offset k in BatchSoftmax function will raise Interger Overflow. That's the reason to make the engine only support 11k context length.
Now, it is fixed and the engine can support at most 32k context length. (:
When max_engine_len exceed 2^14 = 16384, the offset
k
in BatchSoftmax function will raise Interger Overflow. That's the reason to make the engine only support 11k context length.Now, it is fixed and the engine can support at most 32k context length. (: