jzhang38 EasyContext issues

jzhang38 / EasyContext

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Apache License 2.0

649 stars 47 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Inquiry Regarding Zero3 and Sequence Parallelism Compatibility

#54 SihengLi99 opened 1 month ago
2
dependency confilct

#53 SihengLi99 opened 1 month ago
0
saving intermediate checkpoints

#52 1190303125 opened 2 months ago
0
Can not run the example script succesully.

#51 feifeibear opened 2 months ago
0
feat: usp (unified sequence parallelism)

#50 feifeibear closed 1 month ago
0
unified sequence parallel

#49 feifeibear closed 2 months ago
0
add usp (Unified Sequence Parallelism)

#48 feifeibear closed 2 months ago
0
Size mismatch inside zigzag_ringattention backward

#47 jinghan23 opened 2 months ago
0
RuntimeError: CUDA error: an illegal memory access was encountered

#46 uditsharma7 opened 3 months ago
1
Is this SFT method or PT method？

#45 233function opened 3 months ago
1
When will the model code support the Qwen series models?

#44 233function opened 3 months ago
2
TypeError: _flash_attn_forward() missing 1 required positional argument: 'softcap'

#43 Ziyang412 opened 3 months ago
2
How to estimate the maximum context length this repo can support for larger models?

#42 JingyangDeng opened 4 months ago
0
拓展长上下文的技术是？

#41 zzhdbw opened 4 months ago
2
Does this repo work with FSDP or Zero?

#40 LorrinWWW closed 4 months ago
1
Logits shift in loss computation

#39 shivamag125 opened 4 months ago
1
Does it support SFT training?

#38 Lomax314 opened 4 months ago
0
comparison of different sequence parallel methods

#37 sunying2018 opened 4 months ago
1
Dataset length question

#36 5taku opened 5 months ago
2
Will EasyContext support Qwen series model?

#35 WeixuanXiong opened 5 months ago
0
May I see your wandb report while training?

#34 fahadh4ilyas opened 6 months ago
0
How to auto-regression generate？

#33 yileld opened 6 months ago
0
about seq parallel global batch size

#32 Liu-yuliang closed 5 months ago
2
Rotary embedding size missmatch

#31 Toan-Do closed 5 months ago
4
Can we just use the sloth gradient checkpointing by uncommenting this line?

#30 vkaul11 opened 6 months ago
4
can training codellama?

#29 5taku closed 6 months ago
2
Support ulysses flash attn

#28 Kwen-Chen closed 6 months ago
1
how to infer the model?

#27 laoda513 opened 6 months ago
0
Bug: Evals might be broken in pinned HF transformers version `cache=False`

#26 michaelfeil closed 6 months ago
2
shuffle bug?

#25 fmmoret closed 7 months ago
3
how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ?

#24 littttttlebird opened 7 months ago
2
attention_mask

#23 Nianqitongs opened 7 months ago
0
Need a running script for ‘dist_flash_attn’

#22 LzhinFdu opened 7 months ago
5
Model stopped updating after 300-400 steps.

#21 Bostoncake closed 5 months ago
9
integrate it into the Transformers Trainer?

#20 jkl375 opened 7 months ago
1
Appending answer_ids to prompt in `eval_needle.py`

#19 shan18 closed 7 months ago
2
Llama-2 models do not support `sliding_window` parameter

#18 Bostoncake closed 7 months ago
3
Confused by the train scripts

#17 Bostoncake closed 7 months ago
3
LongBench/InfiniteBench

#16 sunying2018 closed 6 months ago
0
Danube2 and Unsloth offloaded gradient ck

#15 jzhang38 closed 7 months ago
0
Error when the model vocabulary is larger than 120k

#14 microhu closed 7 months ago
10
error when finetuning yi-34b

#13 puppet101 opened 7 months ago
2
Data parallel + zigzag_ring_attn support

#12 WallE-Chang opened 7 months ago
3
OOM when seq-length=700k

#11 jkl375 opened 7 months ago
4
Requirements for input length

#10 LzhinFdu opened 7 months ago
2
train speed is too slow

#9 jkl375 opened 7 months ago
2
Not the real auto-regressive decoding mode ?

#8 microhu opened 7 months ago
1
dataset description

#7 sunying2018 closed 4 months ago
3
Which image is used for this job?

#6 AatroxZZ opened 7 months ago
9
Modify interface

#5 jzhang38 closed 7 months ago
1