JsonSerializer throws when serializing non UTF8 compatible ByteStrings

neo-project / neo

NEO Smart Economy

MIT License

3.46k stars 1.03k forks source link

JsonSerializer throws when serializing non UTF8 compatible ByteStrings #2716

Open devhawk opened 2 years ago

devhawk commented 2 years ago

JsonSerializer.Serialize and SerializeToByteArray assume any ByteString or Buffer they encounter can be converted to a string via StrictUTF8.GetString. However, ByteString and Buffer are intended to hold byte arrays of arbitrary length and contents, which are not always valid UTF8 strings.

If this was greenfield, I would argue that ByteString and Buffer should be encoded as Base64 strings for JSON serialization/deserialization. However, for backwards compat, we probably need to try calling GetString first and fall back to base64 encoding if GetString throws. For deserialization, we would need to first attempt base64 decoding via Convert.TryFromBase64String before treating the JSON value as a string

roschler commented 2 years ago

This bug is affecting during the Polaris hackathon.

Jim8y commented 2 years ago

This bug is affecting during the Polaris hackathon.

I would say it is not a bug......it is meant to be utf8,,,,,,,,,, and yes,,,,we could make it support non-utf8

erikzhang commented 2 years ago

JsonSerializer is only responsible for serializing StackItems into JSON objects directly. If you need base64, you have to call StdLib.Base64Encode or StdLib.Base64Decode by yourself.

roschler commented 2 years ago

It is very easy in as real-life context to accidentally throw in a non-UTF character. Perhaps by some popular GUID or other industry standard string building function commonly used by developers.

At least, the error message should be improved to return the exact field element that has the bad character in it. Currently the error message gives a general "translate error" error that doesn't help you in trying to trace out the property path of the field with the problem. My data object has several child array fields, some of them with child objects of their own.

Also, why not add some text to the error message like "Note, any string field with non-UTF characters must be base64 encoded". That way developers can see that helpful tip immediately, instead of hoping they find GitHub threads like these?

Anything we can do to remove pain points like these makes Neo N3 development that much more attractive to the developers out there. Developers who have (too many) other blockchain platforms to choose from, when deciding where to create dApps. I want them to choose Neo N3.

devhawk commented 2 years ago

I would say it is not a bug......it is meant to be utf8,,,,,,,,,, and yes,,,,we could make it support non-utf8

I don't understand "it is meant to be utf8". What is meant to be utf8? Strings are meant to be UTF8, but 160 and 256 bit hashes are decidedly NOT meant to be UTF8 and represented in NeoVM as ByteStrings just as strings are.

JsonSerializer is only responsible for serializing StackItems into JSON objects directly. If you need base64, you have to call StdLib.Base64Encode or StdLib.Base64Decode by yourself.

This isn't documented anywhere. Furthermore, it's a pretty poor developer experience that limits the use of JsonSerialize. If a developer needs to JsonSerialize an array of UInt160s, what do they do? Copy them into a new array, converting to Base64 encoding as they go?

erikzhang commented 2 years ago

The JsonSerializer is a low-level interface. If you want it to automatically identify the data type, distinguish a string and a byte array from a ByteString, or distinguish an array and an object from an Array, you need to provide a type descriptor, or even the function of reflection. This will be a lot of work. So I don't think we have plans to support it at the moment.