issues
search
syncdoth
/
RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.
MIT License
226
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
I don't know if I should input attention_mask in the SFT process
#40
wac81
opened
7 months ago
0
Parallel inference is thought to be faster than recurrent inference, but it turns out that it is not in play file
#39
wac81
opened
7 months ago
0
contains the question when inference
#38
wac81
opened
7 months ago
0
Integration with transformers library
#37
kiucho
opened
7 months ago
0
HuggingFace checkpoint
#36
xtwigs
opened
10 months ago
2
how to load model with device_map="auto"
#35
wac81
opened
10 months ago
1
The number of parameters does not match the setting in paper
#34
ziHoHe
opened
10 months ago
1
can't train 3B model in 48GB single card
#33
wac81
closed
7 months ago
2
Can you support streaming when generating?
#32
wac81
opened
10 months ago
2
Initialize word embedding layer
#31
hyunwoongko
closed
10 months ago
7
Info/Documentation on chunkwise training
#30
pkpro
opened
11 months ago
5
Added description for torch.compile
#29
ce-lery
opened
12 months ago
1
gradient_checkpointing=True issue in TrainerArgument
#28
lolshuo
closed
1 year ago
1
Would it be possible to integrate an attention sink https://arxiv.org/pdf/2309.17453.pdf into RetNet?
#27
pkpro
closed
1 year ago
4
Tokenizer Choice?
#26
risedangel
closed
1 year ago
1
1. Bug fix. 2. add fast long retention implement
#25
veya2ztn
opened
1 year ago
3
Default Config Update, TensorParallel Options (compatible with ColossalAI), train memory save, train stability options
#24
syncdoth
closed
1 year ago
0
Add Hidden Size for DeepSpeed integration
#23
infosechoudini
opened
1 year ago
2
huggingface integration for generate
#22
syncdoth
closed
1 year ago
2
Torchscale 230930
#21
syncdoth
closed
1 year ago
0
Revert "Torchscale 230930"
#20
syncdoth
closed
1 year ago
0
Torchscale 230930
#19
syncdoth
closed
1 year ago
1
Official implementation
#18
syncdoth
closed
1 year ago
1
Can't Resume Training from Checkpoint
#17
infosechoudini
closed
1 year ago
1
How to use multiple GPUs for model parallel training
#16
zhihui-shao
opened
1 year ago
5
passing attention_mask doesn't work for recurrent
#15
infiniteperplexity
closed
1 year ago
2
Comments on the model
#14
okpatil4u
opened
1 year ago
4
Can you provide a LICENCE file
#13
Shubhankar-Aidetic
closed
1 year ago
2
How to load my own model
#12
zhihui-shao
closed
1 year ago
1
ValueError: not enough values to unpack (expected 2, got 1)
#11
pathoncyp
closed
1 year ago
3
Changelog of official implementation
#10
donglixp
closed
1 year ago
5
Fixes some issues encountered during model.generate invocations with do_sample=True
#9
jploski
closed
1 year ago
0
Question about verifying the Inference Latency
#8
LiZeng001
closed
1 year ago
3
parallel and recurrent forward achieves totally different output
#7
Zhihan1996
closed
1 year ago
2
encountered nan while trying to train
#6
liujuncn
opened
1 year ago
10
真缺一个全量多卡显存叠加并行训练方案,如果能行也算是一种成功!/ There is really a lack of a full-scale multi-card video memory superposition parallel training scheme. If it can be done, it can be regarded as a success!
#5
gg22mm
closed
1 year ago
4
Errors when running your examples
#4
houghtonweihu
closed
1 year ago
2
Training using HF Transformers
#3
nebulatgs
closed
1 year ago
1
Huggingface Integration
#2
syncdoth
closed
1 year ago
1
somewhere that needs to be modified
#1
liujuncn
closed
1 year ago
1