issues
search
kimiyoung
/
transformer-xl
Apache License 2.0
3.59k
stars
762
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How to train transformer-xl for new datasets (Specifically Hindi)
#152
SandyPanda-MLDL
opened
3 months ago
0
Why do you pass query, key, and value through the same fc_layer in transformer_xl model?
#151
wonjunchoi-arc
opened
10 months ago
0
About Using
#150
Tuziking
opened
10 months ago
0
[W C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:963] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)
#148
Arsmart1
opened
1 year ago
0
How to obtain the data?
#147
Arsmart1
closed
1 year ago
0
docs: demo, experiments and live inference API on Tiyaro
#146
ijonglin
opened
1 year ago
2
enwiki8 18 layer model .sh file
#145
vasily789
opened
2 years ago
0
hotfix for ParameterList in AdaptiveEmb, ProjAdaptiveSoftmax with DataParallel
#144
yurakuratov
closed
2 years ago
0
Differences in DecoderLayer and RelDecoderLayers/RelPartialDecoderLayers
#143
jannessm
closed
2 years ago
1
RelPartialLearnableDecoder vs RelLearnableDecoder
#142
jannessm
closed
2 years ago
1
feat: replace einsum with matmul for efficiency
#141
yoyololicon
opened
2 years ago
0
why i-j always>0
#140
scirocc
opened
2 years ago
0
CUBLAS_STATUS_EXECUTION_FAILED and Blas GEMM launch failed
#139
CaoYiqingT
opened
2 years ago
0
Relative Positional Encoding
#138
LarsHill
opened
2 years ago
1
linux or windows?
#137
lyxwz
opened
3 years ago
1
运行不起来
#136
lyons-deng
opened
3 years ago
0
error
#135
li-wei-21
opened
3 years ago
0
can you provide an example program running with Python script?
#134
zane-star-bot
opened
3 years ago
1
fixed the order of arguments of _update_mems() function. No impact fu…
#133
victor-psiori
opened
3 years ago
0
Question: why is relative positional encoding computed with length M vs. L+M in the paper ?
#132
gdoras
opened
3 years ago
0
Possibly Incorrect Calculation of Perplexity in Pytorch Implementation
#131
shaan97
opened
3 years ago
0
Pytorch programs have been killed unexpectedlly
#130
Dinxin
opened
3 years ago
0
The output of _rel_shift(...) does not conform to paper ?
#129
huangpeng1126
closed
3 years ago
1
Difference between ppl and bpc
#128
valofosho
opened
3 years ago
0
Copyright missing
#127
kazuki-irie
opened
3 years ago
0
Why use memory with LMShuffledIterator
#126
serkansulun
opened
3 years ago
0
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
#125
demdecuong
opened
3 years ago
0
Can someone please tell me on what dataset was transformer-XL pre-trained on?
#124
pathak-aman
opened
3 years ago
0
StopIteration: Caught StopIteration in replica 0 on device 0.
#123
codybai
opened
3 years ago
6
Remove unnecessary reshape
#122
hqbao
closed
3 years ago
0
tf 2.x and python 3.x
#121
kdlin
opened
3 years ago
1
TF base model memory requirements
#120
tonytan48
opened
4 years ago
0
Sin/Cos concatenation in Positional Embeddings
#119
zainsarwar865
opened
4 years ago
1
wrong argument order of _update_mems function!
#118
jech2
opened
4 years ago
1
fine-tune text classification?
#117
vr25
opened
4 years ago
0
What is the meaning of 'bsz' in mem_transformer.py?
#116
oshindow
closed
4 years ago
0
Pytorch questions!
#115
garysun1994
opened
4 years ago
1
Different training steps in tf and pytorch
#114
richardbaihe
closed
3 years ago
3
Perplexity not changes with tgt_len
#113
bajajahsaas
opened
4 years ago
0
can not reproduce sota wikitext103 results
#112
menghuanlater
closed
4 years ago
4
Why pos_seq is in descending order as the input of positional embedding?
#111
GrindstoneLZX
opened
4 years ago
2
PositionalEmbedding error
#110
Macielyoung
closed
4 years ago
2
Bounty: PTB Transformer-xl
#109
srush
closed
4 years ago
2
论文中的figure1有些看不懂,有大神可以解答一下吗?
#108
heroazhe
closed
4 years ago
0
question on TRAIN_BSZ used in tf/scripts/text8_large_tpu.sh
#107
lelouchmatlab
opened
4 years ago
0
Best settings to train Transformer-XL from scratch
#106
AndreaLK3
closed
4 years ago
0
qkv computation
#105
donglixp
opened
4 years ago
0
what if mems is None?
#104
LindgeW
opened
4 years ago
0
fix a typo (dataset name)
#103
LiyuanLucasLiu
opened
4 years ago
0
How mem_len affects 1-billion lm experiment result
#102
cmathx
opened
4 years ago
0
Next