letuananh / bela

👸 BELA - A pathway for creating and analysing multi-lingual transcripts using BELA convention and ELAN software
MIT License
3 stars 2 forks source link

Remove ellipsis when tokenizing special tokens such as `:m:` and `:si:` #3

Open vicchuayh opened 1 year ago

vicchuayh commented 1 year ago

BELA v2.0.0a21.post6 tokenize() function does not process ellipsis when they are embedded in singing or mimicking segments.

E.g., applying tokenize() function on :si:walking+in+the+jungle+walking+in+the+jung... yields: ['walking', 'in', 'the', 'jungle', 'walking', 'in', 'the', 'jung...']

Proposing to remove ellipsis if ellipsis parameter is set True. Expected result would look like this: ['walking', 'in', 'the', 'jungle', 'walking', 'in', 'the', 'jung'].