Not using normalized Token when sentencizing

trungtv / vivi_spacy

A Vietnamese model for spaCy.io

45 stars 15 forks source link

Not using normalized Token when sentencizing #3

Closed hieuhc closed 6 years ago

hieuhc commented 6 years ago

Hi, first of all thanks for a very nice package.

When I try using sentence segmentation, doc.sents returns a list of sentences with normalized tokens, e.g tạo một sân_chơi lành_mạnh để cán_bộ Hội được giao_lưu , trao_đổi kinh_nghiệm. Is there anyway to get sentences with original (not normalized) token when sentencizing?

Thanks

trungtv commented 6 years ago

Chào bạn, Hiện tại vivi_spacy chưa hỗ trợ tính năng này bạn nhé. Tuy nhiên do có nhiều bạn có yêu cầu lấy lại câu với tokens gốc nên chúng tôi sẽ tìm cách hỗ trợ trong thời gian tới.