kimiyoung transformer-xl issues

kimiyoung / transformer-xl

Apache License 2.0

3.59k stars 762 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to train transformer-xl for new datasets (Specifically Hindi)

#152 SandyPanda-MLDL opened 3 months ago
0
Why do you pass query, key, and value through the same fc_layer in transformer_xl model?

#151 wonjunchoi-arc opened 10 months ago
0
About Using

#150 Tuziking opened 10 months ago
0
[W C:\w\b\windows\pytorch\aten\src\ATen\native\cuda\Indexing.cu:963] Warning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (function masked_fill__cuda)

#148 Arsmart1 opened 1 year ago
0
How to obtain the data?

#147 Arsmart1 closed 1 year ago
0
docs: demo, experiments and live inference API on Tiyaro

#146 ijonglin opened 1 year ago
2
enwiki8 18 layer model .sh file

#145 vasily789 opened 2 years ago
0
hotfix for ParameterList in AdaptiveEmb, ProjAdaptiveSoftmax with DataParallel

#144 yurakuratov closed 2 years ago
0
Differences in DecoderLayer and RelDecoderLayers/RelPartialDecoderLayers

#143 jannessm closed 2 years ago
1
RelPartialLearnableDecoder vs RelLearnableDecoder

#142 jannessm closed 2 years ago
1
feat: replace einsum with matmul for efficiency

#141 yoyololicon opened 2 years ago
0
why i-j always>0

#140 scirocc opened 2 years ago
0
CUBLAS_STATUS_EXECUTION_FAILED and Blas GEMM launch failed

#139 CaoYiqingT opened 2 years ago
0
Relative Positional Encoding

#138 LarsHill opened 2 years ago
1
linux or windows？

#137 lyxwz opened 3 years ago
1
运行不起来

#136 lyons-deng opened 3 years ago
0
error

#135 li-wei-21 opened 3 years ago
0
can you provide an example program running with Python script?

#134 zane-star-bot opened 3 years ago
1
fixed the order of arguments of _update_mems() function. No impact fu…

#133 victor-psiori opened 3 years ago
0
Question: why is relative positional encoding computed with length M vs. L+M in the paper ?

#132 gdoras opened 3 years ago
0
Possibly Incorrect Calculation of Perplexity in Pytorch Implementation

#131 shaan97 opened 3 years ago
0
Pytorch programs have been killed unexpectedlly

#130 Dinxin opened 3 years ago
0
The output of _rel_shift(...) does not conform to paper ?

#129 huangpeng1126 closed 3 years ago
1
Difference between ppl and bpc

#128 valofosho opened 3 years ago
0
Copyright missing

#127 kazuki-irie opened 3 years ago
0
Why use memory with LMShuffledIterator

#126 serkansulun opened 3 years ago
0
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

#125 demdecuong opened 3 years ago
0
Can someone please tell me on what dataset was transformer-XL pre-trained on?

#124 pathak-aman opened 3 years ago
0
StopIteration: Caught StopIteration in replica 0 on device 0.

#123 codybai opened 3 years ago
6
Remove unnecessary reshape

#122 hqbao closed 3 years ago
0
tf 2.x and python 3.x

#121 kdlin opened 3 years ago
1
TF base model memory requirements

#120 tonytan48 opened 4 years ago
0
Sin/Cos concatenation in Positional Embeddings

#119 zainsarwar865 opened 4 years ago
1
wrong argument order of _update_mems function!

#118 jech2 opened 4 years ago
1
fine-tune text classification?

#117 vr25 opened 4 years ago
0
What is the meaning of 'bsz' in mem_transformer.py?

#116 oshindow closed 4 years ago
0
Pytorch questions!

#115 garysun1994 opened 4 years ago
1
Different training steps in tf and pytorch

#114 richardbaihe closed 3 years ago
3
Perplexity not changes with tgt_len

#113 bajajahsaas opened 4 years ago
0
can not reproduce sota wikitext103 results

#112 menghuanlater closed 4 years ago
4
Why pos_seq is in descending order as the input of positional embedding?

#111 GrindstoneLZX opened 4 years ago
2
PositionalEmbedding error

#110 Macielyoung closed 4 years ago
2
Bounty: PTB Transformer-xl

#109 srush closed 4 years ago
2
论文中的figure1有些看不懂，有大神可以解答一下吗？

#108 heroazhe closed 4 years ago
0
question on TRAIN_BSZ used in tf/scripts/text8_large_tpu.sh

#107 lelouchmatlab opened 4 years ago
0
Best settings to train Transformer-XL from scratch

#106 AndreaLK3 closed 4 years ago
0
qkv computation

#105 donglixp opened 4 years ago
0
what if mems is None?

#104 LindgeW opened 4 years ago
0
fix a typo (dataset name)

#103 LiyuanLucasLiu opened 4 years ago
0
How mem_len affects 1-billion lm experiment result

#102 cmathx opened 4 years ago
0