Training with gpus - Githubissues

kyegomez / LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

https://discord.gg/qUtxnK2NMf

Apache License 2.0

689 stars 64 forks source link

Closed fangxy100 closed 1 year ago

fangxy100 commented 1 year ago

could u tell me which version of accelerate and torch you choose

kyegomez commented 1 year ago

@fangxy100 Hey, torch=2.0 and accelerate = newest, flash attention would not work otherwise. Why do you ask?