syncdoth RetNet issues - Githubissues

syncdoth / RetNet

Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.

MIT License

226 stars 24 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

I don't know if I should input attention_mask in the SFT process

#40 wac81 opened 7 months ago
0
Parallel inference is thought to be faster than recurrent inference, but it turns out that it is not in play file

#39 wac81 opened 7 months ago
0
contains the question when inference

#38 wac81 opened 7 months ago
0
Integration with transformers library

#37 kiucho opened 7 months ago
0
HuggingFace checkpoint

#36 xtwigs opened 10 months ago
2
how to load model with device_map="auto"

#35 wac81 opened 10 months ago
1
The number of parameters does not match the setting in paper

#34 ziHoHe opened 10 months ago
1
can't train 3B model in 48GB single card

#33 wac81 closed 7 months ago
2
Can you support streaming when generating?

#32 wac81 opened 10 months ago
2
Initialize word embedding layer

#31 hyunwoongko closed 10 months ago
7
Info/Documentation on chunkwise training

#30 pkpro opened 11 months ago
5
Added description for torch.compile

#29 ce-lery opened 12 months ago
1
gradient_checkpointing=True issue in TrainerArgument

#28 lolshuo closed 1 year ago
1
Would it be possible to integrate an attention sink https://arxiv.org/pdf/2309.17453.pdf into RetNet?

#27 pkpro closed 1 year ago
4
Tokenizer Choice?

#26 risedangel closed 1 year ago
1
1. Bug fix. 2. add fast long retention implement

#25 veya2ztn opened 1 year ago
3
Default Config Update, TensorParallel Options (compatible with ColossalAI), train memory save, train stability options

#24 syncdoth closed 1 year ago
0
Add Hidden Size for DeepSpeed integration

#23 infosechoudini opened 1 year ago
2
huggingface integration for generate

#22 syncdoth closed 1 year ago
2
Torchscale 230930

#21 syncdoth closed 1 year ago
0
Revert "Torchscale 230930"

#20 syncdoth closed 1 year ago
0
Torchscale 230930

#19 syncdoth closed 1 year ago
1
Official implementation

#18 syncdoth closed 1 year ago
1
Can't Resume Training from Checkpoint

#17 infosechoudini closed 1 year ago
1
How to use multiple GPUs for model parallel training

#16 zhihui-shao opened 1 year ago
5
passing attention_mask doesn't work for recurrent

#15 infiniteperplexity closed 1 year ago
2
Comments on the model

#14 okpatil4u opened 1 year ago
4
Can you provide a LICENCE file

#13 Shubhankar-Aidetic closed 1 year ago
2
How to load my own model

#12 zhihui-shao closed 1 year ago
1
ValueError: not enough values to unpack (expected 2, got 1)

#11 pathoncyp closed 1 year ago
3
Changelog of official implementation

#10 donglixp closed 1 year ago
5
Fixes some issues encountered during model.generate invocations with do_sample=True

#9 jploski closed 1 year ago
0
Question about verifying the Inference Latency

#8 LiZeng001 closed 1 year ago
3
parallel and recurrent forward achieves totally different output

#7 Zhihan1996 closed 1 year ago
2
encountered nan while trying to train

#6 liujuncn opened 1 year ago
10
真缺一个全量多卡显存叠加并行训练方案,如果能行也算是一种成功!/ There is really a lack of a full-scale multi-card video memory superposition parallel training scheme. If it can be done, it can be regarded as a success!

#5 gg22mm closed 1 year ago
4
Errors when running your examples

#4 houghtonweihu closed 1 year ago
2
Training using HF Transformers

#3 nebulatgs closed 1 year ago
1
Huggingface Integration

#2 syncdoth closed 1 year ago
1
somewhere that needs to be modified

#1 liujuncn closed 1 year ago
1