Closed Duyi-Wang closed 1 month ago
Next version, we may not need FP32 for logits.
It can reduce communication overhead, but we still need to convert to FP32 when passed to Python for sampling.... We can do the convert and reorder at the same time.
It can reduce communication overhead, but we still need to convert to FP32 when passed to Python for sampling.... We can do the convert and reorder at the same time.