How did you pre-train the NCBI abstract data exactly ?

ncbi-nlp / bluebert

BlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).

https://arxiv.org/abs/1906.05474

Other

558 stars 78 forks source link

How did you pre-train the NCBI abstract data exactly ? #5

Closed zhouyunyun11 closed 5 years ago

zhouyunyun11 commented 5 years ago

In your manuscript, your described like this: "We initialized BERT with pre-trained BERT provided by (Devlin et al., 2019). We then continue to pre-train the model, using the listed corpora".

Did you use BERT code completely re-train the NCBI abstract corpora? Or used BERT initial model and wordpiece strategy as bioBERT method?

yfpeng commented 5 years ago

We used BERT initial model and workpiece strategy.

zhouyunyun11 commented 5 years ago

Do you mean you used the same strategy as Bio_BERT?

On Wed, Oct 2, 2019 at 9:20 AM Yifan Peng notifications@github.com wrote:

We used BERT initial model and workpiece strategy.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi-nlp/NCBI_BERT/issues/5?email_source=notifications&email_token=ABMXK4RSESWPXS2UYJ7DCV3QMSN3BA5CNFSM4I4WGP32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEW26Y#issuecomment-537488763, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMXK4XPGPTZMKUSHTE4E73QMSN3BANCNFSM4I4WGP3Q .

zhouyunyun11 commented 5 years ago

Did you create your own vocab.txt file or use Google default one?

On Wed, Oct 2, 2019 at 4:33 PM Yunyun Zhou zhouyunyun11@gmail.com wrote:

Do you mean you used the same strategy as Bio_BERT?

On Wed, Oct 2, 2019 at 9:20 AM Yifan Peng notifications@github.com wrote:

We used BERT initial model and workpiece strategy.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi-nlp/NCBI_BERT/issues/5?email_source=notifications&email_token=ABMXK4RSESWPXS2UYJ7DCV3QMSN3BA5CNFSM4I4WGP32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAEW26Y#issuecomment-537488763, or mute the thread https://github.com/notifications/unsubscribe-auth/ABMXK4XPGPTZMKUSHTE4E73QMSN3BANCNFSM4I4WGP3Q .

yfpeng commented 5 years ago

We used the Google default vocab.txt

yfpeng commented 5 years ago

I am not sure what you meant by "same strategy as Bio_BERT"