issues
search
jzhang38
/
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
649
stars
47
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Inquiry Regarding Zero3 and Sequence Parallelism Compatibility
#54
SihengLi99
opened
1 month ago
2
dependency confilct
#53
SihengLi99
opened
1 month ago
0
saving intermediate checkpoints
#52
1190303125
opened
2 months ago
0
Can not run the example script succesully.
#51
feifeibear
opened
2 months ago
0
feat: usp (unified sequence parallelism)
#50
feifeibear
closed
1 month ago
0
unified sequence parallel
#49
feifeibear
closed
2 months ago
0
add usp (Unified Sequence Parallelism)
#48
feifeibear
closed
2 months ago
0
Size mismatch inside zigzag_ringattention backward
#47
jinghan23
opened
2 months ago
0
RuntimeError: CUDA error: an illegal memory access was encountered
#46
uditsharma7
opened
3 months ago
1
Is this SFT method or PT method?
#45
233function
opened
3 months ago
1
When will the model code support the Qwen series models?
#44
233function
opened
3 months ago
2
TypeError: _flash_attn_forward() missing 1 required positional argument: 'softcap'
#43
Ziyang412
opened
3 months ago
2
How to estimate the maximum context length this repo can support for larger models?
#42
JingyangDeng
opened
4 months ago
0
拓展长上下文的技术是?
#41
zzhdbw
opened
4 months ago
2
Does this repo work with FSDP or Zero?
#40
LorrinWWW
closed
4 months ago
1
Logits shift in loss computation
#39
shivamag125
opened
4 months ago
1
Does it support SFT training?
#38
Lomax314
opened
4 months ago
0
comparison of different sequence parallel methods
#37
sunying2018
opened
4 months ago
1
Dataset length question
#36
5taku
opened
5 months ago
2
Will EasyContext support Qwen series model?
#35
WeixuanXiong
opened
5 months ago
0
May I see your wandb report while training?
#34
fahadh4ilyas
opened
6 months ago
0
How to auto-regression generate?
#33
yileld
opened
6 months ago
0
about seq parallel global batch size
#32
Liu-yuliang
closed
5 months ago
2
Rotary embedding size missmatch
#31
Toan-Do
closed
5 months ago
4
Can we just use the sloth gradient checkpointing by uncommenting this line?
#30
vkaul11
opened
6 months ago
4
can training codellama?
#29
5taku
closed
6 months ago
2
Support ulysses flash attn
#28
Kwen-Chen
closed
6 months ago
1
how to infer the model?
#27
laoda513
opened
6 months ago
0
Bug: Evals might be broken in pinned HF transformers version `cache=False`
#26
michaelfeil
closed
6 months ago
2
shuffle bug?
#25
fmmoret
closed
7 months ago
3
how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ?
#24
littttttlebird
opened
7 months ago
2
attention_mask
#23
Nianqitongs
opened
7 months ago
0
Need a running script for ‘dist_flash_attn’
#22
LzhinFdu
opened
7 months ago
5
Model stopped updating after 300-400 steps.
#21
Bostoncake
closed
5 months ago
9
integrate it into the Transformers Trainer?
#20
jkl375
opened
7 months ago
1
Appending answer_ids to prompt in `eval_needle.py`
#19
shan18
closed
7 months ago
2
Llama-2 models do not support `sliding_window` parameter
#18
Bostoncake
closed
7 months ago
3
Confused by the train scripts
#17
Bostoncake
closed
7 months ago
3
LongBench/InfiniteBench
#16
sunying2018
closed
6 months ago
0
Danube2 and Unsloth offloaded gradient ck
#15
jzhang38
closed
7 months ago
0
Error when the model vocabulary is larger than 120k
#14
microhu
closed
7 months ago
10
error when finetuning yi-34b
#13
puppet101
opened
7 months ago
2
Data parallel + zigzag_ring_attn support
#12
WallE-Chang
opened
7 months ago
3
OOM when seq-length=700k
#11
jkl375
opened
7 months ago
4
Requirements for input length
#10
LzhinFdu
opened
7 months ago
2
train speed is too slow
#9
jkl375
opened
7 months ago
2
Not the real auto-regressive decoding mode ?
#8
microhu
opened
7 months ago
1
dataset description
#7
sunying2018
closed
4 months ago
3
Which image is used for this job?
#6
AatroxZZ
opened
7 months ago
9
Modify interface
#5
jzhang38
closed
7 months ago
1
Next