issues
search
microsoft
/
torchscale
Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k
stars
201
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
embed_tokens
#59
CodeMiningCZW
opened
11 months ago
4
Question about is_first_step and Retnet
#58
tdomhan
closed
11 months ago
2
Retnet parameter dimension
#57
allanj
closed
11 months ago
2
"sentencepiece.bpe.model" and "dict.txt" in page below seem not available
#56
HuXinjing
closed
11 months ago
2
Retnet training is slow
#55
Zth9730
closed
11 months ago
2
RetNet : Check consistency of each forward mode
#54
mmorinag127
closed
11 months ago
9
Is there some example of the paper? e.g., compare of the inference latency
#53
LiZeng001
closed
11 months ago
1
Training & Inference examples for RetNet
#52
jhl-Det
closed
11 months ago
1
fix chunkwise inconsistency bug
#51
sunyt32
closed
11 months ago
0
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward
#50
wangmengzhi
closed
11 months ago
0
RetNet: relative position
#49
fkodom
closed
11 months ago
5
Multi-Scale Retention: Why include position embeddings explicitly?
#48
fkodom
closed
11 months ago
3
scale.sqrt() in the recurrent_forward function of the multiscale_retention module
#47
wangmengzhi
closed
11 months ago
6
Update epsilon in retention
#46
sunyt32
closed
12 months ago
0
LEX inference support and checkpoint
#45
RulinShao
closed
11 months ago
5
recurrent_forward in MultiScaleRetention
#44
Anker-ZX-AI
closed
1 year ago
1
AttributeError: 'EncoderConfig' object has no attribute 'decoder_layers'
#43
dedekinds
closed
12 months ago
2
the meaning of "incremental_state" in RetNet
#42
jhl-Det
closed
12 months ago
3
can not download dict.txt
#41
robotzheng
closed
12 months ago
2
Inconsist recurrent and parallel results for RetNet
#40
YirunKCL
closed
1 year ago
4
Config fix
#39
agoryuno
opened
1 year ago
0
Remove inheritance from `object`
#38
agoryuno
opened
1 year ago
2
Longnet Code Release
#37
arnavdantuluri
closed
7 months ago
13
testing very large attention windows
#36
fredzannarbor
opened
1 year ago
0
About the param `scale_base`
#35
horizon94
closed
7 months ago
1
some result plots are not show
#34
klae01
closed
1 year ago
1
support lm prefix computation in one go
#33
XingxingZhang
closed
1 year ago
0
EncoderDecoder Configuration Issue
#32
klae01
closed
1 year ago
1
add basic test
#31
klae01
closed
1 year ago
1
make pgs global
#30
njb-ms
closed
1 year ago
2
question about the number of output_projection
#29
violet-sto
closed
1 year ago
1
xPos cross-attention change
#28
janEbert
closed
1 year ago
2
Bump timm version to latest
#27
JonathanRayner
closed
1 year ago
0
Fairseq version compatible with torchscale
#25
sjelassi
closed
1 year ago
1
Swapped naive dot product attention for flash attention
#24
usryokousha
opened
1 year ago
4
About running speed
#23
NieShenRuc
opened
1 year ago
0
Could not install fairseq
#22
BaohaoLiao
closed
1 year ago
1
v0.2.0
#21
shumingma
closed
1 year ago
0
fx BERT + moe
#20
buaahsh
closed
1 year ago
0
Update README.md
#19
buaahsh
closed
1 year ago
0
Installer bug - wrong `apex` package installed
#18
jph00
closed
1 year ago
2
SMOE or XMOE Network how to "evaluate" and "save and resume"
#17
randomtutu
closed
1 year ago
2
Questions about the implementation of deepnorm
#16
jiaohuix
closed
1 year ago
2
[Question] what are the usages of multiway_network.py?
#15
yiqiwang8177
closed
1 year ago
2
Does Torchscale support vision transformers in vision tasks?
#14
nightsnack
closed
1 year ago
5
Q) Tensor parallel for magneto
#13
taehwakkwon
closed
1 year ago
9
Batch size first
#12
shumingma
closed
1 year ago
0
Adding the official implementation of Xpos (https://arxiv.org/abs/2212.10554)
#11
shumingma
closed
1 year ago
0
Library issues
#10
PrAsAnNaRePo
closed
1 year ago
2
fix a bug that overrides the default constructed output_projection
#9
MatthewChang
closed
1 year ago
1
Previous
Next