issues
search
jquesnelle
/
yarn
YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.24k
stars
109
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
the detail of code implements.
#62
cooper12121
opened
2 weeks ago
0
Question about the 128K context dataset
#61
OutstanderWang
closed
1 month ago
1
Question related to _yarn_linear_ramp_mask
#60
chizhang118
opened
2 months ago
0
OOM error of distributed training on 80GB GPUs with Mistral-7b
#59
TracyPlus
opened
2 months ago
2
cannot connect to hugging face
#58
RENNY-Jenius
opened
3 months ago
0
Why the updated cache is initialized with seqlen=256?
#57
pUmpKin-Co
opened
3 months ago
0
Trying to set a tensor of shape torch.Size([257, 1024]) in "weight" (which has shape torch.Size([1226, 1024])), this look incorrect
#56
Litong-sTs
opened
3 months ago
0
Can we run the replication of the results,8 * 80 A100
#55
zhanglv0209
opened
3 months ago
1
How should I proceed with conducting an evaluation for lm-evaluation-harness?
#54
yuri-son
opened
3 months ago
0
Questions about DynamicNTK
#53
wutong4012
opened
3 months ago
0
An OOM error occurred while computing the perplexity of 128k Proofpoint documents with a maximum token count set to 128k.
#52
HIT-cwh
opened
3 months ago
0
Phi 2
#51
fakerybakery
opened
4 months ago
0
Could this repository be used for sft based on YaRN?
#50
Zheng-Jay
opened
4 months ago
0
OOM on two 80GB GPUs
#49
kyleliang919
opened
5 months ago
5
Unexpected larger perplexity on PG19
#48
Yiyi-philosophy
opened
6 months ago
1
cannot load safetensor: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096]))
#47
tpoisonooo
closed
7 months ago
4
Update README.md
#46
tpoisonooo
opened
7 months ago
1
deepspeed config crashed for `auto` and OOM
#45
tpoisonooo
closed
7 months ago
3
Running Error
#44
wangyang-stu
opened
7 months ago
2
requirements.txt: require transformers v4.35.0
#43
cebtenzzre
closed
7 months ago
1
70b
#42
jquesnelle
closed
7 months ago
0
Question about Yarn environment configuration (v2)
#41
Yiyi-philosophy
closed
7 months ago
4
RoPE scaling config confusing
#40
tattrongvu
opened
7 months ago
0
Inquiry Regarding Evaluation Metrics in Your Paper
#39
teslacool
opened
7 months ago
2
context length and dataset size
#38
shossain
opened
7 months ago
0
Hardware equipments and training time?
#37
zhoumengbo
opened
7 months ago
0
What is the purpose of `finetuned` parameter in `LlamaDynamicYaRNScaledRotaryEmbedding`?
#36
fahadh4ilyas
opened
7 months ago
0
Mistral-train error on deepspeed config
#35
xiechengmude
opened
7 months ago
1
Questions about max-position-embeddings
#34
Xnhyacinth
opened
8 months ago
0
How do i increases the context of already fined tuned or base model of llama2 ?
#33
Tejaswi-kashyap-006
opened
8 months ago
0
Training takes a long time
#32
Michelleable
closed
8 months ago
3
Runtime error
#31
shossain
opened
8 months ago
0
A hardcore-mode multiple passkey evaluation
#30
honglu2875
closed
9 months ago
0
It is the highest at the lowest dimension and the lowest at the highest dimension.
#29
eyuansu62
opened
9 months ago
0
dataset preprocessing script
#28
mces89
opened
9 months ago
1
A potential bug in scaled_rope/LlamaDynamicScaledRotaryEmbedding.py
#27
pengli09
opened
9 months ago
0
Can it be debug with deepspeed + trainer
#26
cableyang
closed
8 months ago
1
Training system configuration
#25
shossain
closed
9 months ago
1
inv_freq seems not calculated right
#24
dwzhu-pku
closed
9 months ago
9
What the recommended GPU setup for fine-tuning ?
#23
fyang7
closed
9 months ago
8
v2版本的论文什么时候提交到arxiv上?
#22
bojone
opened
9 months ago
2
OOM when doing text generation
#21
sjelassi
opened
9 months ago
3
Please, writting LISENSE
#20
chatblanc-ciel
closed
9 months ago
2
Yarn gets worse results than NTK-aware-scaling policy, under non-fine-tuned scenarios
#19
mmmans
opened
9 months ago
8
OSError: [Errno 28] No space left on device
#18
goog
closed
9 months ago
1
Should the training incremental? From 64k to 128k having the output of first training passed as input to the next?
#17
sreenivasmrpivot
opened
9 months ago
5
Are 7B and 13B Models fine-tuned?
#16
RonanKMcGovern
opened
9 months ago
3
Update Errata.md
#15
eltociear
closed
8 months ago
2
Confirmation of License
#14
RonanKMcGovern
closed
9 months ago
2
Sliding window perplexity with truncated documents
#13
woominsong
closed
9 months ago
5
Next