haoliuhl ringattention issues

haoliuhl / ringattention

Transformers with Arbitrarily Large Context

Apache License 2.0

630 stars 50 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[BPT] Question about the Scaling Factor in Equation from the Paper

#22 Xiaoming-Zhao opened 15 hours ago
0
Llama 3 ring attention implementation for inference

#21 joshpopelka20gmail opened 3 months ago
1
This work doesn't change kernel, but utilize dependency to compute a whole line?

#20 ziyuhuang123 opened 3 months ago
0
Could you provice GPU code like A100?

#19 ziyuhuang123 opened 3 months ago
0
segment_ids_ops

#18 haoliuhl closed 4 months ago
0
scripts/jax2hf. py error

#17 liuxpro opened 5 months ago
1
Incorrect project requirements

#16 hadipash closed 4 months ago
1
Test Script Issues

#15 djbyrne opened 7 months ago
0
Questions about the paper

#14 hiroshinoji opened 7 months ago
2
fine-tuning model mismatch - KeyError

#13 chenwuperth closed 8 months ago
0
(minor) Correct references in ring_attention.py

#12 Selimonder closed 8 months ago
1
vmem OOM on TPU

#11 hxssgaa closed 4 months ago
2
Pretrained models?

#10 matteoguarrera closed 4 months ago
1
JAX partitioning error when attempting to run with sequence parallelism factor not a power of 2

#9 exists-forall opened 1 year ago
0
[Question] Add a normalization layer between Attention and FFN?

#8 findmyway closed 1 year ago
4
improved llama sharding

#7 haoliuhl closed 1 year ago
0
improved llamabpt sharding

#6 haoliuhl closed 1 year ago
0
train_dataset. download

#5 lljjgg closed 1 year ago
1
PyTorch Implementation

#4 conceptofmind closed 1 year ago
10
Rename model.py to model1.py

#3 Varshi292 closed 1 year ago
0
Question: Has this been tested against the Trition Flash Attention version?

#2 casper-hansen closed 1 year ago
10
How to combine BPT with sequence parallel?

#1 fanghgit closed 1 year ago
2