intel / xFasterTransformer

Apache License 2.0
270 stars 53 forks source link

[Model] Fix array out of bounds when rank > 2. #441

Closed Duyi-Wang closed 1 month ago

Duyi-Wang commented 1 month ago

Next version, we may not need FP32 for logits.

It can reduce communication overhead, but we still need to convert to FP32 when passed to Python for sampling.... We can do the convert and reorder at the same time.