ray-project / llmperf

LLMPerf is a library for validating and benchmarking LLMs
Apache License 2.0
659 stars 113 forks source link

Using `max_concurrency` or AsyncActor? #76

Open chiragjn opened 1 month ago

chiragjn commented 1 month ago

I was reading through the code and trying out a large number of workers with a tiny model. Often I found my cpu and memory getting bottlenecked because of the large number of actors launching Python workers.

Wondering what was the motivation to not use threaded or async actors to achieve higher concurrency?