microsoft torchscale issues

microsoft / torchscale

Foundation Architecture for (M)LLMs

https://aka.ms/GeneralAI

MIT License

2.98k stars 201 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

embed_tokens

#59 CodeMiningCZW opened 11 months ago
4
Question about is_first_step and Retnet

#58 tdomhan closed 11 months ago
2
Retnet parameter dimension

#57 allanj closed 11 months ago
2
"sentencepiece.bpe.model" and "dict.txt" in page below seem not available

#56 HuXinjing closed 11 months ago
2
Retnet training is slow

#55 Zth9730 closed 11 months ago
2
RetNet : Check consistency of each forward mode

#54 mmorinag127 closed 11 months ago
9
Is there some example of the paper? e.g., compare of the inference latency

#53 LiZeng001 closed 11 months ago
1
Training & Inference examples for RetNet

#52 jhl-Det closed 11 months ago
1
fix chunkwise inconsistency bug

#51 sunyt32 closed 11 months ago
0
Adding sqrt in the recurrent_forward of retnet to make it consistent with parallel_forward

#50 wangmengzhi closed 11 months ago
0
RetNet: relative position

#49 fkodom closed 11 months ago
5
Multi-Scale Retention: Why include position embeddings explicitly?

#48 fkodom closed 11 months ago
3
scale.sqrt() in the recurrent_forward function of the multiscale_retention module

#47 wangmengzhi closed 11 months ago
6
Update epsilon in retention

#46 sunyt32 closed 12 months ago
0
LEX inference support and checkpoint

#45 RulinShao closed 11 months ago
5
recurrent_forward in MultiScaleRetention

#44 Anker-ZX-AI closed 1 year ago
1
AttributeError: 'EncoderConfig' object has no attribute 'decoder_layers'

#43 dedekinds closed 12 months ago
2
the meaning of "incremental_state" in RetNet

#42 jhl-Det closed 12 months ago
3
can not download dict.txt

#41 robotzheng closed 12 months ago
2
Inconsist recurrent and parallel results for RetNet

#40 YirunKCL closed 1 year ago
4
Config fix

#39 agoryuno opened 1 year ago
0
Remove inheritance from `object`

#38 agoryuno opened 1 year ago
2
Longnet Code Release

#37 arnavdantuluri closed 7 months ago
13
testing very large attention windows

#36 fredzannarbor opened 1 year ago
0
About the param `scale_base`

#35 horizon94 closed 7 months ago
1
some result plots are not show

#34 klae01 closed 1 year ago
1
support lm prefix computation in one go

#33 XingxingZhang closed 1 year ago
0
EncoderDecoder Configuration Issue

#32 klae01 closed 1 year ago
1
add basic test

#31 klae01 closed 1 year ago
1
make pgs global

#30 njb-ms closed 1 year ago
2
question about the number of output_projection

#29 violet-sto closed 1 year ago
1
xPos cross-attention change

#28 janEbert closed 1 year ago
2
Bump timm version to latest

#27 JonathanRayner closed 1 year ago
0
Fairseq version compatible with torchscale

#25 sjelassi closed 1 year ago
1
Swapped naive dot product attention for flash attention

#24 usryokousha opened 1 year ago
4
About running speed

#23 NieShenRuc opened 1 year ago
0
Could not install fairseq

#22 BaohaoLiao closed 1 year ago
1
v0.2.0

#21 shumingma closed 1 year ago
0
fx BERT + moe

#20 buaahsh closed 1 year ago
0
Update README.md

#19 buaahsh closed 1 year ago
0
Installer bug - wrong `apex` package installed

#18 jph00 closed 1 year ago
2
SMOE or XMOE Network how to "evaluate" and "save and resume"

#17 randomtutu closed 1 year ago
2
Questions about the implementation of deepnorm

#16 jiaohuix closed 1 year ago
2
[Question] what are the usages of multiway_network.py?

#15 yiqiwang8177 closed 1 year ago
2
Does Torchscale support vision transformers in vision tasks?

#14 nightsnack closed 1 year ago
5
Q) Tensor parallel for magneto

#13 taehwakkwon closed 1 year ago
9
Batch size first

#12 shumingma closed 1 year ago
0
Adding the official implementation of Xpos (https://arxiv.org/abs/2212.10554)

#11 shumingma closed 1 year ago
0
Library issues

#10 PrAsAnNaRePo closed 1 year ago
2
fix a bug that overrides the default constructed output_projection

#9 MatthewChang closed 1 year ago
1

Previous Next