zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

Is xlnet indeed context aware? #222

Open studiocardo opened 5 years ago

studiocardo commented 5 years ago

Hi All

I've been playing with Spacy and BERT and I'm trying to see how the embedding of each word varies across different context.

For example, for the following three sentences:

nlp = spacy.load("en_pytt_bertbaseuncased_lg") apple1 = nlp("Apple shares rose on the news.") apple2 = nlp("Apple sold fewer iPhones this quarter.") apple3 = nlp("Apple pie is delicious.")

print(apple1[0].similarity(apple2[0])) # 0.73428553 print(apple1[0].similarity(apple3[0])) # 0.43365782

0.7342856 0.43365765

As one would expect. So far so good. However, if I do the same w/

nlp_xlnet = spacy.load("en_pytt_xlnetbasecased_lg") apple1 = nlp_xlnet("Apple shares rose on the news.") apple2 = nlp_xlnet("Apple sold fewer iPhones this quarter.") apple3 = nlp_xlnet("Apple pie is delicious.") print(apple1[0].similarity(apple2[0])) # 0.73428553 print(apple1[0].similarity(apple3[0])) # 0.43365782

0.9853272 0.9792127

It means that xlnet (at least in this example) is completely unaware of the context. Given xlnet's stellar GLUE and Squad2 results, I was really surprised by this finding. Granted, it's only a super trivial example, but still, it causes me to pause and scratch my head.

Anyone else has experienced similar results? Or maybe I've done something wrong or simply missed how the whole thing was supposed to work?

Thank you for your input. SH

illuminascent commented 5 years ago

FYI I tried several ways to construct a sentence embedding given text input and hidden outputs. They all turned out to be surprisingly similar in cosine similarity(just like the result you get), while if the same thing was done to bert, the embeddings show desirable similarity and dissimilarity. I was thinking it may just be the result of the absence of sentence level pretraining tasks, but well, seeing your result makes me wonder even more.

stefan-it commented 5 years ago

What happens if you use the cased model of BERT 🤔

studiocardo commented 5 years ago

I am aware of the casing discrepancy. However, I can only use what came w/ Spacy… :(

I should have tried more examples with uncase words… I’ll do that and report the results.

SH

On Aug 30, 2019, at 1:19 AM, Stefan Schweter notifications@github.com wrote:

What happens if you use the cased model of BERT 🤔

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zihangdai/xlnet/issues/222?email_source=notifications&email_token=ABTZAPCIBHQKMWR7WXGPBODQHDJZBA5CNFSM4IRXFJXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5Q6BBI#issuecomment-526508165, or mute the thread https://github.com/notifications/unsubscribe-auth/ABTZAPE5FKPCKSY6FMGD4X3QHDJZBANCNFSM4IRXFJXA.

maziyarpanahi commented 4 years ago

I have observed a similar issue when it comes to context for word embeddings which can explain why it might behave the same on sentence level.

In ELMO, BERT and ALBERT they all aware of the context: “Bank river.” “Bank rober.”

The word Bank has different embeddings vectors since the context is different, unfortunately, in XLNet the Bank has the same embeddings.

https://github.com/zihangdai/xlnet/issues/264

maziyarpanahi commented 4 years ago

Did anyone figure this out? I am still experiencing the same issue with no solution: https://github.com/zihangdai/xlnet/issues/264