zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.17k stars 1.18k forks source link

Question about bi_data and attention type with bsz #76

Open graykode opened 5 years ago

graykode commented 5 years ago

Hello. First, I know that batch size is even number with bi_data How about between attention type(uni- or bi-)?? Is it related to batch size even or odd??

kimiyoung commented 5 years ago

No it's not.

hockeybro12 commented 5 years ago

@kimiyoung Does the value of bsz change somehwhere?

In train_gpu.py I set the train batch size to 8 and I checked the value of the batch_size parameter there. However, I fail the assertion assert bsz%2 == 0 in modeling.py. Any reason for this?

graykode commented 5 years ago

@hockeybro12 Hello. did you use bi_data True?

hockeybro12 commented 5 years ago

@graykode Yes. Also I was able to run the model by commenting out that assertion and didn't get an error. That's interesting.

graykode commented 5 years ago

@hockeybro12 What is your run Task? it means run_classification, run_squad or etc.

hockeybro12 commented 5 years ago

@graykode run_gpu. I'm trying to do my own pre-training.

graykode commented 5 years ago

@hockeybro12 This is because, I assert tensor type with python interger type,, I will fixed it and re-pull request

hockeybro12 commented 5 years ago

@graykode Ok, thanks.

vanh17 commented 5 years ago

Hi, I encountered the same problem here when the assertion bsz%2==0. I print out the bsz. The bsz was something like this:

I0717 06:04:18.694480 140162421307200 modeling.py:236] *******************************************batch size before assertion fail Tensor("model/transformer/strided_slice:0", shape=(), dtype=int32, device=/gpu:0)**************************************************

Have this been fixed recently? Thank you!