Support Unicode codepoint length in length() function

The current length() function for strings counts bytes rather than Unicode codepoints, which makes it difficult to work with non-ASCII text. For example:

json := `{"text": "Hello 世界"}`
// Current behavior:
// length($.text) returns 12 (byte count)
// Desired behavior:
// length($.text) returns 8 (character count: "Hello " = 6, "世界" = 2)

Suggested Implementation

Add a new function like strlen() or enhance the existing length() to handle Unicode properly by using utf8.RuneCountInString() from the standard library when operating on string values.

Example implementation approach:

if node.IsString() {
    return NumericNode("length", float64(utf8.RuneCountInString(node.MustString())))
}
// existing array/object length logic...

This would make the library more useful for international text processing and JSONPath queries involving non-ASCII strings.

Benefits

More intuitive behavior for string length calculations
Better support for international text
Consistency with how most programming languages handle string lengths

Let me know if you would like me to provide additional examples or test cases.

Related Go documentation: https://pkg.go.dev/unicode/utf8#RuneCountInString

spyzhov / ajson

Support Unicode codepoint length in length() function #80

Suggested Implementation

Benefits