Fix triton example in readme

replit / ReplitLM

Inference code and configs for the ReplitLM model family

https://huggingface.co/replit

Apache License 2.0

923 stars 77 forks source link

Closed tanmay-bakshi closed 1 year ago

tanmay-bakshi commented 1 year ago

The input token IDs should be long, not bfloat16, when using the Triton attention implementation, as they're fed to an embedding layer.

pirroh commented 1 year ago

Thanks for catching this, Tanmay! Merging your fix here, and mirroring also to our HuggingFace repo.