I’m trying to build a line breaking algorithm on top of the word segmenter, so that I can lay out some text in paragraphs in svg. The current Intl.Segmenter seems to put words and punctuation into different segments, so that “Who? Why?” becomes 5 segments: “who,” “?,” “ ,” “why,” and “?”
When laying out text in paragraphs, I usually want the punctuation to stay glued to the nearest word, but there isn’t enough information coming back from the segmenter to do this. It might be nice if the segmenter had an option to include punctuation with words during segmentation, or if the segment iterator returned additional information beyond “isWordLike.” Perhaps “isWhitespace” and/or “isPunctuation” would be enough, but I’m not sure.
I’m trying to build a line breaking algorithm on top of the word segmenter, so that I can lay out some text in paragraphs in svg. The current Intl.Segmenter seems to put words and punctuation into different segments, so that “Who? Why?” becomes 5 segments: “who,” “?,” “ ,” “why,” and “?”
When laying out text in paragraphs, I usually want the punctuation to stay glued to the nearest word, but there isn’t enough information coming back from the segmenter to do this. It might be nice if the segmenter had an option to include punctuation with words during segmentation, or if the segment iterator returned additional information beyond “isWordLike.” Perhaps “isWhitespace” and/or “isPunctuation” would be enough, but I’m not sure.