kyegomez / LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
https://discord.gg/qUtxnK2NMf
Apache License 2.0
663 stars 63 forks source link

Training with gpus #7

Closed fangxy100 closed 11 months ago

fangxy100 commented 11 months ago

could u tell me which version of accelerate and torch you choose

kyegomez commented 11 months ago

@fangxy100 Hey, torch=2.0 and accelerate = newest, flash attention would not work otherwise. Why do you ask?