xuyige / BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``
Apache License 2.0
617 stars 99 forks source link

Dealing with multiple sentences #1

Open LivC193 opened 4 years ago

LivC193 commented 4 years ago

Hi sorry to bother you, but I have one question.

Documents have multiple sentences so how do you deal with that ? Do you split the text into sentences and the concatenate the final embeddings for each sentence or do you remove all punctuation marks so the text won't have any [SEP] tokens.

xuyige commented 4 years ago

thank you for your issue for document classification, we do not split the text into sentences (except the Hierarchical methods) we do not remove punctuation masks. for the whole document, we regard it as a long sentence.

AnastasiaMaugham commented 3 years ago

thank you for your issue for document classification, we do not split the text into sentences (except the Hierarchical methods) we do not remove punctuation masks. for the whole document, we regard it as a long sentence.

hi, could you tell me how to code with different numbers of sentences in the hierachical methods? (variant length of inputs)