spyzhov / ajson

Abstract JSON for Golang with JSONPath support
MIT License
246 stars 23 forks source link

Support Unicode codepoint length in length() function #80

Open Danialova opened 1 week ago

Danialova commented 1 week ago

The current length() function for strings counts bytes rather than Unicode codepoints, which makes it difficult to work with non-ASCII text. For example:

json := `{"text": "Hello 世界"}`
// Current behavior:
// length($.text) returns 12 (byte count)
// Desired behavior:
// length($.text) returns 8 (character count: "Hello " = 6, "世界" = 2)

Suggested Implementation

Add a new function like strlen() or enhance the existing length() to handle Unicode properly by using utf8.RuneCountInString() from the standard library when operating on string values.

Example implementation approach:

if node.IsString() {
    return NumericNode("length", float64(utf8.RuneCountInString(node.MustString())))
}
// existing array/object length logic...

This would make the library more useful for international text processing and JSONPath queries involving non-ASCII strings.

Benefits

Let me know if you would like me to provide additional examples or test cases.

Related Go documentation: https://pkg.go.dev/unicode/utf8#RuneCountInString

spyzhov commented 1 day ago

Hello, Thanks a lot, I like the idea of adding the strlen function, so I will do it :+1: