issues
search
microsoft
/
torchscale
Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
3.01k
stars
202
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
torchscale 0.3.0 requires fairscale==0.4.0, but you have fairscale 0.4.13 which is incompatible.
#110
pandayummy
opened
1 month ago
0
Minecraft
#109
Pelaez99
closed
3 months ago
0
Question about LongNet attention map overlap
#108
RmZeta2718
opened
4 months ago
0
Different batch sizes lead to different evalution results for LongVIT
#107
HHHedo
opened
5 months ago
0
How to test the model
#106
ReloJeffrey
opened
5 months ago
0
pip error
#105
wanghaoran-ucas
opened
5 months ago
0
Where is the offset implemented in Multi-head dilated attention ?
#104
AshStuff
opened
6 months ago
0
can't use longvit
#103
abebe9849
opened
6 months ago
0
Question about learnable segment lengths and dilation rates
#102
benrousePUC
opened
6 months ago
0
How to use retention in RetNet for cross-attention?
#101
yxchng
opened
6 months ago
0
renames longnet file; longnet example in readme works now
#100
JacksonSearle
opened
7 months ago
1
Checkpoint for RetNet
#99
macsz
opened
7 months ago
0
What WSI level was used for pretraining LongVit?
#98
jpfeil
closed
7 months ago
1
about attention mask
#97
hichoe95
closed
8 months ago
0
Bump pillow from 10.0.0 to 10.2.0 in /examples/longvit
#96
dependabot[bot]
opened
8 months ago
0
about the longnet's ppl
#95
robotzheng
opened
9 months ago
2
Update requirements.txt
#94
I8dNLo
closed
9 months ago
0
Bump transformers from 4.8.1 to 4.36.0 in /examples/longvit
#93
dependabot[bot]
opened
10 months ago
0
Fix No module named 'torch._six'
#92
ahmedhshahin
opened
10 months ago
0
Bump pyarrow from 9.0.0 to 14.0.1 in /examples/longvit
#91
dependabot[bot]
opened
10 months ago
0
Bump scipy from 1.6.3 to 1.10.0 in /examples/longvit
#90
dependabot[bot]
opened
10 months ago
0
Bump pillow from 10.0.0 to 10.0.1 in /examples/longvit
#89
dependabot[bot]
closed
8 months ago
1
Bump transformers from 4.8.1 to 4.30.0 in /examples/longvit
#88
dependabot[bot]
closed
10 months ago
1
Release LongNet and LongViT
#87
shumingma
closed
10 months ago
0
Wrong Rnm Normalization.
#86
pdradx
opened
10 months ago
1
Introducing padding_mask to RetNet
#85
xtwigs
opened
10 months ago
2
Question regarding the configuration of decoder_retention_heads
#84
Kratos-Wen
opened
10 months ago
2
Training RetNet on A100 GPUs
#83
Antoine-Bergerault
opened
10 months ago
1
[Minor issue] Discrepancy inside arxiv paper
#82
radarFudan
opened
11 months ago
0
Question about the normalization in attention
#81
Cranial-XIX
closed
10 months ago
2
Question about RetNetRelPos
#80
hyunwoongko
closed
10 months ago
2
about gamma/decay in RetNet
#79
rouniuyizu
closed
11 months ago
2
typo in normalization denominator in parallel retention?
#78
XintianHan
closed
11 months ago
1
Chunk recurrent representation incorrect results
#77
N0r9st
closed
11 months ago
7
Query about Retentive Network's Recurrent Representation
#76
gopi-erabati
closed
11 months ago
1
About training memory
#75
HoraceXIaoyiBao
closed
11 months ago
2
BEiT3 Vision-Language Expert question
#74
andreapdr
closed
1 year ago
4
AttributeError: 'EncoderDecoderConfig' object has no attribute 'normalize_output'
#73
Yuki2L0ve
closed
12 months ago
3
RuntimeError: The size of tensor a (5) must match the size of tensor b (2) at non-singleton dimension 0
#72
codinglover0111
closed
11 months ago
3
Compatibility with torchsummary
#71
lzqlzzq
closed
11 months ago
1
fix fairseq example
#70
sunyt32
closed
1 year ago
0
Update new RetNet settings
#69
sunyt32
closed
1 year ago
0
initialization of qkv
#68
XintianHan
closed
1 year ago
3
pip package does not contain RetNet
#67
fabienGenhealth
closed
12 months ago
2
Question on decay factor for attention with xPos
#66
mvbakulin
closed
1 year ago
1
There're a confusion in torchscale
#65
lovekang3344
closed
12 months ago
3
retnet traning config
#64
hanlinxuy
opened
1 year ago
7
Could you please explain the reason behind defining TEMPERATURE_FOR_L_UAX in the code without actually using it?
#63
Ruiyuan-Zhang
closed
1 year ago
1
Question about the recurrent forward of MultiScaleRetention
#62
LEECHOONGHO
closed
1 year ago
2
Can Torchscale be applied in point cloud tasks?
#61
huiyang0613
closed
1 year ago
2
Next