ispras-texterra / derek

DEREK (Domain Entities and Relations Extraction Kit)
GNU General Public License v3.0
10 stars 1 forks source link

Bug: UDPipeTextSegmentor doesn't work properly #31

Open trifonov-vl opened 4 years ago

trifonov-vl commented 4 years ago

Current implementation of UDPipeTextSegmentor uses Token.getTokenRangeStart() and Token.getTokenRangeEnd() to get token ranges but they could be invalid (smth like 140185923506224) if word is part of MultiwordToken so raw tokens, tokens and sentences is invalid if text contains multiword tokens.