Currently, max_threads in PA is by default set to 16 in async mode. While using genai-perf, concurrency value can be passed as a CLI argument. However, the max_threads used by PA is still 16 although concurrency is set to a higher value. This change sets max_threads to concurrency if concurrency > 16. If concurrency <= 16, max_threads is by default set to 16.
Based on a comment from @nv-hwoo, it sounds like this fixes async mode whereas sync mode already had similar behavior. If so, can you please specify that in the PR title and description?
Currently, max_threads in PA is by default set to 16 in async mode. While using genai-perf, concurrency value can be passed as a CLI argument. However, the max_threads used by PA is still 16 although concurrency is set to a higher value. This change sets max_threads to concurrency if concurrency > 16. If concurrency <= 16, max_threads is by default set to 16.