issues
search
jzhang38
/
EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
Apache License 2.0
527
stars
33
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Dataset length question
#36
5taku
opened
1 week ago
2
Will EasyContext support Qwen series model?
#35
WeixuanXiong
opened
3 weeks ago
0
May I see your wandb report while training?
#34
fahadh4ilyas
opened
1 month ago
0
How to auto-regression generate?
#33
yileld
opened
1 month ago
0
about seq parallel global batch size
#32
Liu-yuliang
closed
2 weeks ago
2
Rotary embedding size missmatch
#31
Toan-Do
closed
3 weeks ago
4
Can we just use the sloth gradient checkpointing by uncommenting this line?
#30
vkaul11
opened
1 month ago
4
can training codellama?
#29
5taku
closed
1 month ago
2
Support ulysses flash attn
#28
Kwen-Chen
closed
1 month ago
1
how to infer the model?
#27
laoda513
opened
1 month ago
0
Bug: Evals might be broken in pinned HF transformers version `cache=False`
#26
michaelfeil
closed
1 month ago
2
shuffle bug?
#25
fmmoret
closed
2 months ago
3
how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ?
#24
littttttlebird
opened
2 months ago
1
attention_mask
#23
Nianqitongs
opened
2 months ago
0
Need a running script for ‘dist_flash_attn’
#22
LzhinFdu
opened
2 months ago
5
Model stopped updating after 300-400 steps.
#21
Bostoncake
closed
6 days ago
8
integrate it into the Transformers Trainer?
#20
jkl375
opened
2 months ago
1
Appending answer_ids to prompt in `eval_needle.py`
#19
shan18
closed
2 months ago
2
Llama-2 models do not support `sliding_window` parameter
#18
Bostoncake
closed
2 months ago
3
Confused by the train scripts
#17
Bostoncake
closed
2 months ago
3
LongBench/InfiniteBench
#16
sunying2018
closed
1 month ago
0
Danube2 and Unsloth offloaded gradient ck
#15
jzhang38
closed
2 months ago
0
Error when the model vocabulary is larger than 120k
#14
microhu
closed
2 months ago
10
error when finetuning yi-34b
#13
puppet101
opened
2 months ago
0
Data parallel + zigzag_ring_attn support
#12
WallE-Chang
opened
2 months ago
1
OOM when seq-length=700k
#11
jkl375
opened
2 months ago
4
Requirements for input length
#10
LzhinFdu
opened
2 months ago
2
train speed is too slow
#9
jkl375
opened
2 months ago
2
Not the real auto-regressive decoding mode ?
#8
microhu
opened
2 months ago
1
dataset description
#7
sunying2018
opened
2 months ago
3
Which image is used for this job?
#6
AatroxZZ
opened
2 months ago
9
Modify interface
#5
jzhang38
closed
2 months ago
1
Lightseq
#4
jzhang38
closed
2 months ago
5
Does the input sharding match exact optimization of long sequence?
#3
guanzhchen
closed
2 months ago
2
Switching to monkey patch
#2
jzhang38
closed
2 months ago
0
LICENSE
#1
fmmoret
closed
2 months ago
1