Use of Next Sentence Prediction loss

kyzhouhzau / BERT-NER

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

MIT License

1.23k stars 335 forks source link

Use of Next Sentence Prediction loss #69

Open FallakAsad opened 4 years ago

FallakAsad commented 4 years ago

According to Bert's architecture, the loss is calculated as the sum of the mean masked LM likelihood and the mean next sentence prediction likelihood. Does this implementation includes the next sentence prediction loss when calculating loss? Does the use of 'SEP' tag will have any effect on training loss?