rivo / uniseg

Unicode Text Segmentation, Word Wrapping, and String Width Calculation in Go
MIT License
581 stars 60 forks source link

Any chance of implementing word-segmenting? #2

Closed cpence closed 2 years ago

cpence commented 5 years ago

Hello! I could really use a Golang implementation of the word-splitting rules in UAX#29. I think that with the kind of parser framework you have here, it'd be relatively easy to implement (I could even hack away at a probably-poor pull request, if you were interested). Have you considered adding it to the library?

rivo commented 5 years ago

Yes, it's definitely on the roadmap. Just a question of finding the time to do it so I can't tell you when this will become part of it. There's a chance, however, that priority will increase due to rivo/tview#251.

It's probably not worth submitting a "poor pull request" as it might even take me more time getting that in order than to do it myself (see also here as it applies to this repo, too).

I'll leave this open for now so I can keep track of it.

clipperhouse commented 4 years ago

I’d love to see this project continue. There is an existing library that does it, but it has drawbacks — mainly that it’s code-gen/ported and not as idiomatic as a ‘native’ implementation might be.

rivo commented 2 years ago

This package now supports word boundary detection according to UAX#29. Additionally, sentence boundary detection as well as line breaking / word wrapping (UAX#14) have been added.

cpence commented 2 years ago

Fantastic, thank you so much! (Funnily enough, I found this library because I'm working on text analysis tools, but in the years since I've also become a huge fan of tview, so thanks a bunch for your great work there, too!)

rivo commented 2 years ago

Thanks a lot!

Yes, tview and uniseg are related. tview triggered the creation of uniseg. And these latest additions were necessary because tview is getting a text area (i.e. multiline editing).

I'd say 80% of my time working on tview is spent dealing with Unicode related issues. No exaggeration.

clipperhouse commented 2 years ago

Nice!

If you'll forgive my horn-tooting, in the interim since my comment above, I've implemented UAX 29 as well: https://github.com/clipperhouse/uax29

rivo commented 2 years ago

Cool! Yeah, it took me more than 3 years to add this. All of my open source projects move, but they move slowly. So I completely understand when people do it themselves.

Kudos for getting into UAX #29 and making it work!