Closed garrett4wade closed 1 month ago
The model worker can get stuck in the while-loop when receiving requests. Fix this issue by avoiding blocked request waiting.
The min_size in mini-batched pipeline inference and generation should be pipeline_parallel_world_size() instead of 1.
min_size
pipeline_parallel_world_size()
Remove some dead doc-strings.
Changes
The model worker can get stuck in the while-loop when receiving requests. Fix this issue by avoiding blocked request waiting.
The
min_size
in mini-batched pipeline inference and generation should bepipeline_parallel_world_size()
instead of 1.Remove some dead doc-strings.