modelscope / dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Apache License 2.0
130 stars 14 forks source link

fix: change to size_t to avoid overflow when seq is long #11

Closed yejunjin closed 3 months ago

yejunjin commented 3 months ago

When max_engine_len exceed 2^14 = 16384, the offset k in BatchSoftmax function will raise Interger Overflow. That's the reason to make the engine only support 11k context length.

Now, it is fixed and the engine can support at most 32k context length. (: