zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

Merging XLNet NER project into main repo? #68

Open stevezheng23 opened 5 years ago

stevezheng23 commented 5 years ago

Here is the XLNet extension project which includes a XLNet-NER implementation, https://github.com/stevezheng23/xlnet_extension_tf.

This XLNet extension project is currently importing zihangdai/xlnet repo as its submodule, maybe we can consider merging it into the main repo?

kimiyoung commented 5 years ago

Great. It looks like the results on NER are a bit behind the current SoTA, which is over 93. It would be great to see whether the hparams or implementation could be improved.

stevezheng23 commented 5 years ago

Yes, the result is from the initial run, I haven't tuned the hparams yet.

On Thu, Jun 27, 2019 at 1:10 AM Zhilin Yang notifications@github.com wrote:

Great. It looks like the results on NER are a bit behind the current SoTA, which is over 93. It would be great to see whether the hparams or implementation could be improved.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zihangdai/xlnet/issues/68?email_source=notifications&email_token=ABYXYM7GMJXUP7SEUREXS63P4RYWJA5CNFSM4H3ZHHUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYWKB5Y#issuecomment-506241271, or mute the thread https://github.com/notifications/unsubscribe-auth/ABYXYM4LRT7EQTL6H2DBRVTP4RYWJANCNFSM4H3ZHHUA .

stefan-it commented 5 years ago

@stevezheng23 I've one question regarding to the NER implementation: have you also experiment with using different layers? E.g. the BERT paper (table 7) uses a feature-based approach with a concatenation of the last four layers. Could you give some details what layers you're using in your repo? Thanks :)

stevezheng23 commented 5 years ago

@stefan-it In the initial experiments, I just finetuned the XLNet model by adding a dense + softmax layer on top of the last layer. For feature-based approach, I have't done corresponding experiments yet.

mcggood commented 5 years ago

@stefan-it One off-topic question please, in "concatenation of the last four layers", does "last four layers" means 9,10,11,12 layer?

stefan-it commented 5 years ago

@mcggood Yes :) Btw: here are the results for the feature-based approach from the BERT paper:

image