rivo / uniseg

Unicode Text Segmentation, Word Wrapping, and String Width Calculation in Go
MIT License
570 stars 60 forks source link

Feature request: Grapheme.String() #52

Closed or-else closed 3 months ago

or-else commented 3 months ago

Could you please make Grapheme compliant with Stringer interface?

// String returns the string used to initialize the cluster satisfying Stringer interface.
func (g *Graphemes) String() string {
    return g.original
}

Thanks!

or-else commented 3 months ago

Or, alternatively, it could be more useful to return the the string starting from the current grapheme and to the end of the cluster:

// String returns the string starting from the current grapheme and to the end of the cluster 
// satisfying Stringer interface.
func (g *Graphemes) String() string {
    return g.cluster + g.remaining
}
rivo commented 3 months ago

Sounds good. The latest commit adds a String method.

or-else commented 3 months ago

Please please please! No! Do not alter the string. The truncation remaining[:10] + "..." makes it completely unusable.

or-else commented 3 months ago

The code should do just one thing which it is supposed to do. Any additional logic can be added by the caller.

Also, fmt.Sprintf is a lot more complicated with a ton of overhead compare to str1 + str2. Please just do return g.cluster + g.remaining.

rivo commented 3 months ago

I would ask you to provide justification for your requests. Please include information about how and why is needed in your specific application.

The text to be parsed may be in the megabytes or even larger (e.g. I'm using this package for an editor which can potentially process huge text files). It is completely unfeasible to return the entire text, given that the Stringer interface is often used for debugging or logging purposes. As such, IMHO, the String method should provide some indication of where we are within the string and what the current cluster is.

If you need to access the remaining text, I would propose a GetRemainingString method instead.

Also, if you're worried about performance, note that g.cluster + g.remaining will also allocate a new string which, again, is very inconvenient when dealing with large strings.

But again, please provide some real-world requirements instead of just asking for a feature.

or-else commented 3 months ago

I would ask you to provide justification for your requests. Please include information about how and why is needed in your specific application.

Please take a look at https://github.com/tinode/chat/pull/905/files specifically func graphemeToString(g *uniseg.Graphemes) string { in line 275.

This is absolutely horrendous and there is no way around it.

The text to be parsed may be in the megabytes or even larger (e.g. I'm using this package for an editor which can potentially process huge text files).

Yes, if it does not fit into memory then it's caller's problem to process huge text in chunks. There will always be a limit to memory.

It is completely unfeasible to return the entire text, given that the Stringer interface is often used for debugging or logging purposes.

I believe you are trying to solve a problem at the library level which should be addressed at a higher level. Like cat command line utility which truncates the file and appends '...' instead of just dumping it to stdout because someone may make a mistake.

Do you know any official Go package where String() acts the way you implemented it here?

As such, IMHO, the String method should provide some indication of where we are within the string and what the current cluster is.

Great, please add a method to Grapheme to return the current index into original.

If you need to access the remaining text, I would propose a GetRemainingString method instead.

That would be fine too.

Also, if you're worried about performance, note that g.cluster + g.remaining will also allocate a new string which, again, is very inconvenient when dealing with large strings.

Not all strings are 2GB in length. Most strings are short.

But again, please provide some real-world requirements instead of just asking for a feature.

or-else commented 3 months ago

BTW, if you are concerned with the size of the string, then something like this would work fine:

func (g *Graphemes) String() string {
    return g.original[g.currentClusterIndex:] 
}