Current implementation of UDPipeTextSegmentor uses Token.getTokenRangeStart() and Token.getTokenRangeEnd() to get token ranges but they could be invalid (smth like 140185923506224) if word is part of MultiwordToken so raw tokens, tokens and sentences is invalid if text contains multiword tokens.
Current implementation of UDPipeTextSegmentor uses
Token.getTokenRangeStart()
andToken.getTokenRangeEnd()
to get token ranges but they could be invalid (smth like 140185923506224) if word is part of MultiwordToken so raw tokens, tokens and sentences is invalid if text contains multiword tokens.