Open shivanraptor opened 1 year ago
Something like:
def custom_segment(input_str: str):
segmenter = Segmenter() # possible attributes: disallow, allow, max_word_length
pyseg = pycantonese.segment(input_str, cls=segmenter)
tokenized = []
for word in pyseg:
start = input_str.find(word)
end = start + len(word)
tokenized.append((word, start, end))
return tokenized
Feature you are interested in and your specific question(s): I'm studying Word Segmentation of PyCantonese (https://pycantonese.org/word_segmentation.html), does the function return also the start & end position of the vocabulary?
What you are trying to accomplish with this feature or functionality: I would like to achieve:
Current result:
Would like to have the following result (with the start & end position):
Thanks.