openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation
Apache License 2.0
114 stars 4 forks source link

[Bug Fixes] Fix model worker stuck under some special circumstances. #67

Closed garrett4wade closed 1 month ago

garrett4wade commented 1 month ago

Changes

  1. The model worker can get stuck in the while-loop when receiving requests. Fix this issue by avoiding blocked request waiting.

  2. The min_size in mini-batched pipeline inference and generation should be pipeline_parallel_world_size() instead of 1.

  3. Remove some dead doc-strings.