issues
search
intel
/
xFasterTransformer
Apache License 2.0
268
stars
52
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Kernel] Make SelfAttention prepared for AMX_FP16; More balanced task split in Cross Attention
#466
pujiang2018
opened
20 hours ago
1
[Readme] Add accepted papers
#465
wenhuanh
closed
1 day ago
0
[Layers] Fix invokeAttentionLLaMA API
#464
wenhuanh
closed
1 day ago
1
[Dependency] Bump web_demo requirement.
#463
Duyi-Wang
closed
3 days ago
0
Add env param KV_CACHE_LOCATION to control kv cache memory numanode location
#462
a3213105
opened
1 week ago
2
[Model] Group support for int8/int4 models
#461
xiangzez
opened
1 week ago
0
[Kernel] Cache oneDNN primitive when M < `XFT_PRIMITIVE_CACHE_M`, default 256.
#460
Duyi-Wang
closed
1 week ago
0
[Layers] Enable AMX FP16 of FlashAttn
#459
abenmao
closed
3 days ago
0
[Denpendency] Pin python requirements.txt version.
#458
Duyi-Wang
closed
1 week ago
0
[Bugfix] fixed shm reduceAdd & rope error when batch size is large
#457
abenmao
closed
2 weeks ago
0
[Feature] Enable AMX FP16 on next generation CPU
#456
wenhuanh
closed
1 week ago
4
[run_benchmark.sh] Few cores are running on HBM when batch-size >16 or 32
#455
hangfu-guo
closed
2 weeks ago
3
[Version] v1.7.2.
#454
Duyi-Wang
closed
2 weeks ago
0
[Model] Support hybrid model in continuous batching.
#453
Duyi-Wang
closed
2 weeks ago
0
[Kernel] Enable continuous batching on single GPU.
#452
changqi1
closed
2 weeks ago
0
[Tools] Add Baichuan1/2 convert tool
#451
abenmao
closed
2 weeks ago
0
[Framework] Remove duplicated code
#450
xiangzez
closed
2 weeks ago
0
[Layers] Add qwenRope support for Qwen1.0 in CB mode
#449
abenmao
closed
2 weeks ago
2
[Doc] Add vllm benchmark docs.
#448
marvin-Yu
closed
3 weeks ago
0
[request]qwen1 not supported by vllm-xft
#447
zhm-algo
closed
2 weeks ago
3
[bug] HBM flat QUAD mode determination method is incorrect
#446
xuyizhan
opened
3 weeks ago
0
[Version] v1.7.1.
#445
Duyi-Wang
closed
3 weeks ago
0
Fixed punctuation error in README
#444
denniszhen1
opened
3 weeks ago
0
Update README.md
#443
denniszhen1
closed
3 weeks ago
0
Bump gradio from 4.19.2 to 4.36.0 in /examples/web_demo
#442
dependabot[bot]
closed
4 weeks ago
0
[Model] Fix array out of bounds when rank > 2.
#441
Duyi-Wang
closed
4 weeks ago
1
Crash when using CB mode with multi-rank
#440
a3213105
closed
4 weeks ago
0
[Model] Add Qwen2 GPTQ model support
#439
xiangzez
closed
4 weeks ago
0
Add Continue Batching support for Chatglm2/3
#438
a3213105
closed
4 weeks ago
1
[Kernel] Expand rmsNorm op.
#437
changqi1
closed
4 weeks ago
2
[Common]Add INT8/UINT4 to BF16 weight convert
#436
xiangzez
closed
1 month ago
0
[README] Update README.md.
#435
Duyi-Wang
closed
1 month ago
0
[README] Update README.md.
#434
Duyi-Wang
closed
1 month ago
0
[Version] v1.7.0.
#433
Duyi-Wang
closed
1 month ago
0
[Dependency] Fix wrong so path returned in `get_env()`.
#432
Duyi-Wang
closed
1 month ago
0
[README] Update readme.
#431
Duyi-Wang
closed
1 month ago
0
[Dependency] Update libiomp5.so to `5.0.20230815` contained in mkl.
#430
Duyi-Wang
closed
1 month ago
0
[Layers] Fixed error in yarn
#429
abenmao
closed
1 month ago
0
[Layers] Increased the threshold for enabling flashAttn
#428
abenmao
opened
1 month ago
0
[Python] Add `get_env()` to get LD_PRELOAD set.
#427
Duyi-Wang
closed
1 month ago
0
[CI] Check gcc version.
#426
changqi1
closed
1 month ago
0
[Kernel] Add dynamic onednn matmul.
#425
changqi1
opened
1 month ago
0
[Layers] Fixed the seg fault error when running with more than 4 ranks
#424
abenmao
closed
1 month ago
0
[COMM] Fix bugs of core dump && hang when running cross nodes
#423
abenmao
closed
1 month ago
0
[xDNN] Release v1.5.1.
#422
changqi1
closed
1 month ago
0
[Distribute] Add distribute support for continuous batching api.
#421
Duyi-Wang
closed
1 month ago
2
[Kernel] Less compute for Self-Attention (Q * K)
#420
pujiang2018
closed
1 month ago
0
gcc8.2 编译报错
#419
bukejiyu
closed
4 weeks ago
3
Add --padding and fix bug
#418
yangkunx
closed
1 month ago
0
[Kernel] Add oneDNN AMX_FP16 compute kernels.
#417
changqi1
closed
1 month ago
1
Next