intel / neural-speed

An innovative library for efficient LLM inference via low-bit quantization
https://github.com/intel/neural-speed
Apache License 2.0
350 stars 38 forks source link

[Neural Speed] Fix `ret` when `ignore_prompt` #278

Closed zhentaoyu closed 5 months ago

zhentaoyu commented 5 months ago

Type of Change

feature or bug fix or documentation or others API changed or not

Description

detail description Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

a32543254 commented 5 months ago

we may add some recommend config for benchmark to reach max throughputs like instance and batch size

zhentaoyu commented 5 months ago

we may add some recommend config for benchmark to reach max throughputs like instance and batch size

It depends. We can maintain a table after we do more experiments on different machines (SPR, client, generation ways, first token length, etc.)