zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

get_sequence_output is not contextualized #264

Open maziyarpanahi opened 4 years ago

maziyarpanahi commented 4 years ago

Hi,

I finally managed to use get_sequence_output to get word embeddings after dealing with random embeddings due to dropout, random seed, etc.

However, get_sequence_output() doesn't seem to be contextualized. If you have a string that says 'Bank river.' and get the embeddings for Bank, and try another one with Bank robber.', the embeddings forBankis identical in both tests. In BERT and other contextualized transformers, theBank` has a different vector since the context is not the same.

I tried to play around with a mask, segments, etc. but it's always the same embeddings for a given word in different contexts. I followed the advice, some examples, etc. and the following is my configs:

xlnetConfig = XLNetConfig(FLAGS=None, json_path=json_path)
xlnetRunConfig = RunConfig(
        is_training=False,
        use_tpu=False,
        use_bfloat16=False,
        dropout=0.0,
        dropatt=0.0
)

Even though I've seen some examples using 0.1 for dropout like here, but they have random embeddings issue: https://github.com/amansrivastava17/embedding-as-service/tree/master/server/embedding_as_service/text/xlnet using

Are my XLNet config and run config correct to use the pre-trained weights/checkpoints?

maziyarpanahi commented 4 years ago

Unfortunately, I couldn't find any solution. It seems for some reason (could be totally my mistake) the XLnet pre-trained models are not aware of their surrounding tokens. So no matter what you put before or after unlike BERT it will always generate the same vectors.