nitishm / go-rejson

Golang client for redislabs' ReJSON module with support for multilple redis clients (redigo, go-redis)
MIT License
343 stars 47 forks source link

Invalid data is read in JSONGet when a struct with Unicode strings is passed into JSONSet. #56

Closed sheophe closed 2 years ago

sheophe commented 2 years ago

Describe the bug JSONGet does not produce the correct output for Cyrillic text.

To Reproduce Steps to reproduce the behavior:

  1. Use JSONSet to write a structure with string fields. Fill the fields with Unicode characters.
  2. Use res, err := JSONGet() to read the structure.
  3. Convert res into []byte either manually (res.[]byte), or using redigo.Bytes(res).
  4. Unmarshall the []byte result via json.Unmarshall() into the structure.
  5. Compare values you have written with values json.Unmarshal produced from JSONGet result.
  6. New structure will contain fields with different (seemingly random) characters.

Expected behavior Fields in first structure (which we have written) and the second one (which was read) should match.

Additional context The problem I found lies within rjs.StringToBytes function, which is called from JSONGet. There are the following lines (_lst is a string, by is []byte) :

for _, s := range _lst {
    by = append(by, byte(s))
}

Here, s is a rune, which is an alias for int32. When we convert it into byte, we loose all but the least significant byte. Fix is pretty straightforward, we just need to convert string into []byte directly, without looping over each rune:

by = []byte(_lst)

I've copied JSONGet in my own code and applied this fix, and my Unicode problem was solved.

breno12321 commented 2 years ago

I was having problems with Brazilian Portuguese accentuation and it worked!