Open albertnanda opened 2 years ago
Is this expected?
text = '''Mr. G. B. Shaw, known at his insistence simply as Bernard Shaw, was an Irish playwright.''' print(blingfire.text_to_words(text).split()) print(list(nlp(text))) ##spacy ['Mr', '.', 'G', '.', 'B', '.', 'Shaw', ',', 'known', 'at', 'his', 'insistence', 'simply', 'as', 'Bernard', 'Shaw', ',', 'was', 'an', 'Irish', 'playwright', '.'] [Mr., G., B., Shaw, ,, known, at, his, insistence, simply, as, Bernard, Shaw, ,, was, an, Irish, playwright, .]
The dot(.) in Mr. and G. should be not treated as distinct token, it should be a single token.
Is this expected?
The dot(.) in Mr. and G. should be not treated as distinct token, it should be a single token.