jquesnelle yarn issues - Githubissues

jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models

MIT License

1.24k stars 109 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

the detail of code implements.

#62 cooper12121 opened 2 weeks ago
0
Question about the 128K context dataset

#61 OutstanderWang closed 1 month ago
1
Question related to _yarn_linear_ramp_mask

#60 chizhang118 opened 2 months ago
0
OOM error of distributed training on 80GB GPUs with Mistral-7b

#59 TracyPlus opened 2 months ago
2
cannot connect to hugging face

#58 RENNY-Jenius opened 3 months ago
0
Why the updated cache is initialized with seqlen=256?

#57 pUmpKin-Co opened 3 months ago
0
Trying to set a tensor of shape torch.Size([257, 1024]) in "weight" (which has shape torch.Size([1226, 1024])), this look incorrect

#56 Litong-sTs opened 3 months ago
0
Can we run the replication of the results，8 * 80 A100

#55 zhanglv0209 opened 3 months ago
1
How should I proceed with conducting an evaluation for lm-evaluation-harness?

#54 yuri-son opened 3 months ago
0
Questions about DynamicNTK

#53 wutong4012 opened 3 months ago
0
An OOM error occurred while computing the perplexity of 128k Proofpoint documents with a maximum token count set to 128k.

#52 HIT-cwh opened 3 months ago
0
Phi 2

#51 fakerybakery opened 4 months ago
0
Could this repository be used for sft based on YaRN?

#50 Zheng-Jay opened 4 months ago
0
OOM on two 80GB GPUs

#49 kyleliang919 opened 5 months ago
5
Unexpected larger perplexity on PG19

#48 Yiyi-philosophy opened 6 months ago
1
cannot load safetensor: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096]))

#47 tpoisonooo closed 7 months ago
4
Update README.md

#46 tpoisonooo opened 7 months ago
1
deepspeed config crashed for `auto` and OOM

#45 tpoisonooo closed 7 months ago
3
Running Error

#44 wangyang-stu opened 7 months ago
2
requirements.txt: require transformers v4.35.0

#43 cebtenzzre closed 7 months ago
1
70b

#42 jquesnelle closed 7 months ago
0
Question about Yarn environment configuration (v2)

#41 Yiyi-philosophy closed 7 months ago
4
RoPE scaling config confusing

#40 tattrongvu opened 7 months ago
0
Inquiry Regarding Evaluation Metrics in Your Paper

#39 teslacool opened 7 months ago
2
context length and dataset size

#38 shossain opened 7 months ago
0
Hardware equipments and training time?

#37 zhoumengbo opened 7 months ago
0
What is the purpose of `finetuned` parameter in `LlamaDynamicYaRNScaledRotaryEmbedding`?

#36 fahadh4ilyas opened 7 months ago
0
Mistral-train error on deepspeed config

#35 xiechengmude opened 7 months ago
1
Questions about max-position-embeddings

#34 Xnhyacinth opened 8 months ago
0
How do i increases the context of already fined tuned or base model of llama2 ?

#33 Tejaswi-kashyap-006 opened 8 months ago
0
Training takes a long time

#32 Michelleable closed 8 months ago
3
Runtime error

#31 shossain opened 8 months ago
0
A hardcore-mode multiple passkey evaluation

#30 honglu2875 closed 9 months ago
0
It is the highest at the lowest dimension and the lowest at the highest dimension.

#29 eyuansu62 opened 9 months ago
0
dataset preprocessing script

#28 mces89 opened 9 months ago
1
A potential bug in scaled_rope/LlamaDynamicScaledRotaryEmbedding.py

#27 pengli09 opened 9 months ago
0
Can it be debug with deepspeed + trainer

#26 cableyang closed 8 months ago
1
Training system configuration

#25 shossain closed 9 months ago
1
inv_freq seems not calculated right

#24 dwzhu-pku closed 9 months ago
9
What the recommended GPU setup for fine-tuning ?

#23 fyang7 closed 9 months ago
8
v2版本的论文什么时候提交到arxiv上？

#22 bojone opened 9 months ago
2
OOM when doing text generation

#21 sjelassi opened 9 months ago
3
Please, writting LISENSE

#20 chatblanc-ciel closed 9 months ago
2
Yarn gets worse results than NTK-aware-scaling policy, under non-fine-tuned scenarios

#19 mmmans opened 9 months ago
8
OSError: [Errno 28] No space left on device

#18 goog closed 9 months ago
1
Should the training incremental? From 64k to 128k having the output of first training passed as input to the next?

#17 sreenivasmrpivot opened 9 months ago
5
Are 7B and 13B Models fine-tuned?

#16 RonanKMcGovern opened 9 months ago
3
Update Errata.md

#15 eltociear closed 8 months ago
2
Confirmation of License

#14 RonanKMcGovern closed 9 months ago
2
Sliding window perplexity with truncated documents

#13 woominsong closed 9 months ago
5