microsoft torchscale issues

microsoft / torchscale

Foundation Architecture for (M)LLMs

https://aka.ms/GeneralAI

MIT License

3.01k stars 202 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

torchscale 0.3.0 requires fairscale==0.4.0, but you have fairscale 0.4.13 which is incompatible.

#110 pandayummy opened 1 month ago
0
Minecraft

#109 Pelaez99 closed 3 months ago
0
Question about LongNet attention map overlap

#108 RmZeta2718 opened 4 months ago
0
Different batch sizes lead to different evalution results for LongVIT

#107 HHHedo opened 5 months ago
0
How to test the model

#106 ReloJeffrey opened 5 months ago
0
pip error

#105 wanghaoran-ucas opened 5 months ago
0
Where is the offset implemented in Multi-head dilated attention ?

#104 AshStuff opened 6 months ago
0
can't use longvit

#103 abebe9849 opened 6 months ago
0
Question about learnable segment lengths and dilation rates

#102 benrousePUC opened 6 months ago
0
How to use retention in RetNet for cross-attention?

#101 yxchng opened 6 months ago
0
renames longnet file; longnet example in readme works now

#100 JacksonSearle opened 7 months ago
1
Checkpoint for RetNet

#99 macsz opened 7 months ago
0
What WSI level was used for pretraining LongVit?

#98 jpfeil closed 7 months ago
1
about attention mask

#97 hichoe95 closed 8 months ago
0
Bump pillow from 10.0.0 to 10.2.0 in /examples/longvit

#96 dependabot[bot] opened 8 months ago
0
about the longnet's ppl

#95 robotzheng opened 9 months ago
2
Update requirements.txt

#94 I8dNLo closed 9 months ago
0
Bump transformers from 4.8.1 to 4.36.0 in /examples/longvit

#93 dependabot[bot] opened 10 months ago
0
Fix No module named 'torch._six'

#92 ahmedhshahin opened 10 months ago
0
Bump pyarrow from 9.0.0 to 14.0.1 in /examples/longvit

#91 dependabot[bot] opened 10 months ago
0
Bump scipy from 1.6.3 to 1.10.0 in /examples/longvit

#90 dependabot[bot] opened 10 months ago
0
Bump pillow from 10.0.0 to 10.0.1 in /examples/longvit

#89 dependabot[bot] closed 8 months ago
1
Bump transformers from 4.8.1 to 4.30.0 in /examples/longvit

#88 dependabot[bot] closed 10 months ago
1
Release LongNet and LongViT

#87 shumingma closed 10 months ago
0
Wrong Rnm Normalization.

#86 pdradx opened 10 months ago
1
Introducing padding_mask to RetNet

#85 xtwigs opened 10 months ago
2
Question regarding the configuration of decoder_retention_heads

#84 Kratos-Wen opened 10 months ago
2
Training RetNet on A100 GPUs

#83 Antoine-Bergerault opened 10 months ago
1
[Minor issue] Discrepancy inside arxiv paper

#82 radarFudan opened 11 months ago
0
Question about the normalization in attention

#81 Cranial-XIX closed 10 months ago
2
Question about RetNetRelPos

#80 hyunwoongko closed 10 months ago
2
about gamma/decay in RetNet

#79 rouniuyizu closed 11 months ago
2
typo in normalization denominator in parallel retention?

#78 XintianHan closed 11 months ago
1
Chunk recurrent representation incorrect results

#77 N0r9st closed 11 months ago
7
Query about Retentive Network's Recurrent Representation

#76 gopi-erabati closed 11 months ago
1
About training memory

#75 HoraceXIaoyiBao closed 11 months ago
2
BEiT3 Vision-Language Expert question

#74 andreapdr closed 1 year ago
4
AttributeError: 'EncoderDecoderConfig' object has no attribute 'normalize_output'

#73 Yuki2L0ve closed 12 months ago
3
RuntimeError: The size of tensor a (5) must match the size of tensor b (2) at non-singleton dimension 0

#72 codinglover0111 closed 11 months ago
3
Compatibility with torchsummary

#71 lzqlzzq closed 11 months ago
1
fix fairseq example

#70 sunyt32 closed 1 year ago
0
Update new RetNet settings

#69 sunyt32 closed 1 year ago
0
initialization of qkv

#68 XintianHan closed 1 year ago
3
pip package does not contain RetNet

#67 fabienGenhealth closed 12 months ago
2
Question on decay factor for attention with xPos

#66 mvbakulin closed 1 year ago
1
There're a confusion in torchscale

#65 lovekang3344 closed 12 months ago
3
retnet traning config

#64 hanlinxuy opened 1 year ago
7
Could you please explain the reason behind defining TEMPERATURE_FOR_L_UAX in the code without actually using it?

#63 Ruiyuan-Zhang closed 1 year ago
1
Question about the recurrent forward of MultiScaleRetention

#62 LEECHOONGHO closed 1 year ago
2
Can Torchscale be applied in point cloud tasks?

#61 huiyang0613 closed 1 year ago
2