issues
search
haoliuhl
/
ringattention
Transformers with Arbitrarily Large Context
Apache License 2.0
630
stars
50
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[BPT] Question about the Scaling Factor in Equation from the Paper
#22
Xiaoming-Zhao
opened
15 hours ago
0
Llama 3 ring attention implementation for inference
#21
joshpopelka20gmail
opened
3 months ago
1
This work doesn't change kernel, but utilize dependency to compute a whole line?
#20
ziyuhuang123
opened
3 months ago
0
Could you provice GPU code like A100?
#19
ziyuhuang123
opened
3 months ago
0
segment_ids_ops
#18
haoliuhl
closed
4 months ago
0
scripts/jax2hf. py error
#17
liuxpro
opened
5 months ago
1
Incorrect project requirements
#16
hadipash
closed
4 months ago
1
Test Script Issues
#15
djbyrne
opened
7 months ago
0
Questions about the paper
#14
hiroshinoji
opened
7 months ago
2
fine-tuning model mismatch - KeyError
#13
chenwuperth
closed
8 months ago
0
(minor) Correct references in ring_attention.py
#12
Selimonder
closed
8 months ago
1
vmem OOM on TPU
#11
hxssgaa
closed
4 months ago
2
Pretrained models?
#10
matteoguarrera
closed
4 months ago
1
JAX partitioning error when attempting to run with sequence parallelism factor not a power of 2
#9
exists-forall
opened
1 year ago
0
[Question] Add a normalization layer between Attention and FFN?
#8
findmyway
closed
1 year ago
4
improved llama sharding
#7
haoliuhl
closed
1 year ago
0
improved llamabpt sharding
#6
haoliuhl
closed
1 year ago
0
train_dataset. download
#5
lljjgg
closed
1 year ago
1
PyTorch Implementation
#4
conceptofmind
closed
1 year ago
10
Rename model.py to model1.py
#3
Varshi292
closed
1 year ago
0
Question: Has this been tested against the Trition Flash Attention version?
#2
casper-hansen
closed
1 year ago
10
How to combine BPT with sequence parallel?
#1
fanghgit
closed
1 year ago
2