BERT classification model for processing texts longer than 512 tokens. Text is first divided into smaller chunks and after feeding them to BERT, intermediate results are pooled. The implementation allows fine-tuning.
Other
126
stars
30
forks
source link
Would it be okay to use the code below instead of bert? #24
Original code
new code