undertheseanlp / underthesea

Underthesea - Vietnamese NLP Toolkit
http://undertheseanlp.com
GNU General Public License v3.0
1.35k stars 270 forks source link

Bug detecting names with hyphens. #730

Open home15c6 opened 2 months ago

home15c6 commented 2 months ago

I know hyphenated names like "Jean-Luc Godard" are not typical in Vietnamese, but they may appear in texts, such as news articles.

For ner('Jean-Luc Godard', deep=True)

Expected: B-PER, I-PER, I-PER -> 1 entity Actual: B-PER, B-PER, I-PER -> 2 entities

Note: The model works as expected for Công ty TNHH Bảo hiểm Nhân thọ Dai-ichi Việt Nam -> 1 entity