zihangdai xlnet issues - Githubissues

zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Apache License 2.0

6.16k stars 1.18k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

CPU->GPU Memcpy failed when finetuning with STS-B

#297 xavinatalia opened 2 months ago
0
Why are activation and dropout added after the classification layer?

#296 MrInouye closed 5 months ago
0
xlnet, transformer xl attention score funtion problem

#295 wonjunchoi-arc opened 7 months ago
0
Update data_utils.py

#294 ruxandrastancioi closed 1 year ago
0
pre-train xlnet for French language

#293 karimmahalian opened 1 year ago
0
XLnet colab example error .

#292 AlexTrinityBlock opened 1 year ago
1
【Huawei】2012Lab-Project Cooperation&Exchange Invitation&Job Invitation-Zihang Dai

#291 HanLu1226 opened 1 year ago
0
run error about "InternalError (see above for traceback): Blas xGEMMBatched launch failed : a.shape=[12,512,64], b.shape=[12,64,512], m=512, n=512, k=64, batch_size=12"

#290 ccutyear opened 1 year ago
2
Tokens and values

#289 Dhurim opened 2 years ago
0
Update data_utils.py

#288 DLPerf opened 2 years ago
1
Performance issue in data_utils.py (by P3)

#287 DLPerf opened 2 years ago
1
Performance issues in the program

#286 DLPerf opened 2 years ago
0
Performance issue in the program

#285 DLPerf opened 2 years ago
1
TypeError: Fetch argument None has invalid type <class 'NoneType'> in train_gpu.py

#284 songhee-lee opened 2 years ago
1
How to get the XLNet vocabulary from spiece.model file and store it to a .vocab file?

#283 SambhawDrag opened 2 years ago
0
Feature/enhance predictions workflow

#282 agrudkow closed 3 years ago
0
How to pretrain on multiple GPU?

#281 DHZBill closed 3 years ago
0
checkpoint_management.py export info

#280 dll1314 opened 3 years ago
0
How are the positional encodings derived

#279 bnicholl opened 3 years ago
0
specify tf version 1.x

#278 amrzv opened 3 years ago
0
Why is the first layer of the query stream initialized with the same vector w rather than different vectors?

#277 Huakui-Zhang opened 3 years ago
0
GPT vs BERT, under same computation and data resource, which one is better for downstream tasks like GLUE?

#276 guotong1988 opened 3 years ago
1
XLNet其实不能稳压RoBERTa吧？

#275 guotong1988 closed 3 years ago
1
What is the function of _sample_mask method?

#274 guotong1988 closed 3 years ago
1
Removing mem-reuse will not decrease the pretraining model performance for short text?

#273 guotong1988 opened 3 years ago
0
The relation of reuse_len and mem_len?

#272 guotong1988 closed 3 years ago
1
reuse_len=0 means no mem? And no benefit for long text but not worse for short text?

#271 guotong1988 closed 3 years ago
1
Problem with generating predictions from fine tuned classification model

#270 abdullahkhilji opened 3 years ago
0
Multi-gpu slower than single-gpu

#269 weiyx15 opened 3 years ago
1
OOM with least batch 2 in train_gpu.py

#268 eddatt closed 4 years ago
0
colab notebook can not run under tensorflow 2.0

#267 jlff opened 4 years ago
0
_split_a_and_b

#266 FruVirus closed 4 years ago
0
the special tokens of XLNet is different from BERT

#265 lytum opened 4 years ago
2
get_sequence_output is not contextualized

#264 maziyarpanahi opened 4 years ago
1
Why the max_seq_length = 512 for XLNet?

#263 vr25 opened 4 years ago
4
Is Next Sentence Prediction implemented in the code ?

#262 GhaliaRehawi opened 4 years ago
0
How to use your pretrained model for question-answering ? # Question

#261 Alla-Abdella opened 4 years ago
2
ValueError when running ./gpu_squad_base.sh

#260 Omnis23 opened 4 years ago
3
OOM ERROR when using local batch size=128 on TPUv3-8

#259 GhaliaRehawi opened 4 years ago
1
Is it possible feed xlnet to seq2seq encoder/decoder NMT (for low resource language)?

#258 JohnasSolomon opened 4 years ago
0
Can you upload the processor code(run_classifier.py) for glue dataset(cola, qqp, sst-2, rte, mrpc)?

#257 YJYJLee opened 4 years ago
1
Number of training epochs in original publication

#256 jjedele opened 4 years ago
0
Docker support

#255 sanjibnarzary opened 4 years ago
0
[CLS] token / during training process

#254 cherepanovic opened 4 years ago
0
Is real factorization?

#253 fangwch opened 4 years ago
0
Python2 to Python3?

#252 hammad26 opened 4 years ago
1
Commands for training and testing on IMDB dataset.

#251 VikasRajashekar opened 4 years ago
1
Changing Vocab size

#250 yusufani opened 4 years ago
0
text classification on 3 classes

#249 VikasRajashekar opened 4 years ago
2
Normalization by NFKC

#248 Ina299 closed 4 years ago
1