kyegomez / LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
https://discord.gg/qUtxnK2NMf
Apache License 2.0
689 stars 64 forks source link

Training with gpus #7

Closed fangxy100 closed 1 year ago

fangxy100 commented 1 year ago

could u tell me which version of accelerate and torch you choose

kyegomez commented 1 year ago

@fangxy100 Hey, torch=2.0 and accelerate = newest, flash attention would not work otherwise. Why do you ask?