Closed arnavdantuluri closed 10 months ago
I sent an email to one of the researchers. Hopefully, I get a reply. This paper is honestly a giant step forward
Please let us know if you get a reply 🙂
Looking forward to it
looking forward to the code releasing!!!
please release the code!
Would appreciate it a lot if it's possible to give the community a heads-up when the ETA would be
For anyone that's interested, I built an (unofficial) LongNet implementation here: https://github.com/fkodom/dilated-attention-pytorch
There are no pretrained weights -- I don't have the personal compute budget for that. 😂 But the main concepts are there, and I reproduced scaled-down versions of the inference benchmarks.
Hopefully it's interesting for tinkering and exploration, at least until the official code comes out. 😉
Any updates on when this might officially come out? @fkodom 's repo https://github.com/fkodom/dilated-attention-pytorch is a pretty good implementation while we wait. @DeepDream2045 and I were able to benchmark it processing 64 Million tokens on an RTX A5000 in Linear time. It scaled up nicely to handle 256 Million tokens on a single A100 as well. This was with 8 embed_dim and 4 num_heads. We were also able to validate that the MultiheadDilated Attention Class could handle 32 Million tokens with an embed_dim 128 and num_heads 8 on an A100. Lastly, it seems that the embed_dim and token relationship is entirely linear with regards to memory and runtime. Increasing the number of heads has the effect of linearly increasing runtime. I've lastly plotted the results for variations in the number of embed_dim and num_heads against 4 Million tokens on an A100, starting with 1024 embed_dim and 32 heads and working down to 32 embed_dim and 4 heads.
I've modified the benchmark.py file to make benchmarking easier here: https://github.com/DarcStar-Solutions-Tech/dilated-attention-pytorch
@shumingma @gitnlp @sunyt32 @donglixp @buaahsh @microsoftopensource
Any update regarding release date for the code ?
I am currently working in a project with Google and I am interested to benchmark your new architecture.
Your response is highly appreciated @shumingma @gitnlp @sunyt32 @donglixp @buaahsh @microsoftopensource
Thanks @donglixp for assinging @shumingma for the model release.
@shumingma any estimation when the model code will be released ?
@shumingma and @donglixp it would be great if you could share the timeline for all of us.
Hello all, just wondering whether there is an ETA on the official release of the long net code, it was mentioned here https://github.com/microsoft/unilm/issues/1182#issuecomment-1624938095 that the long net code would be released as part of torchscale. Looking forward to the seeing the official implementation!