Open Chillee opened 6 months ago
This is awesome! I found one small bug:
Grok uses gelu for the MLP block https://github.com/xai-org/grok-1/blob/7050ed204b8206bb8645c7b7bbef7252f79561b0/model.py#L374 But mixtral uses silu https://github.com/pytorch-labs/gpt-fast/blob/de06b53a4f95c72cd3abd0a8e9fa2d6913676c1a/mixtral-moe/model.py#L214
You should replace it with gelu. Otherwise, the model can generate meaningful text but its performance is significantly degraded.
@merrymercy ah that would explain my results haha. Thanks!
Downloading from https://huggingface.co/hpcai-tech/grok-1
Run on 8xA100 80GB