Closed yknl closed 4 years ago
As ascii text is valid utf-8, they can be combined into a single value type.
Correct, but there is a 4x cost savings when using the ascii class.
there is a 4x cost savings when using the ascii class.
There are compression schemes like BOCU that can be used for efficient storage of Unicode character points.
When a character set has been declared, as proposed in https://github.com/clarity-lang/reference/issues/19, the encoding can naively be compressed to be bound by the bytes required to encode the character points in the set, avoiding having to pessimistically allocate up to 4 bytes per unicode character.
More efficient compression schemes can also be applied. Perhaps there is an opening to devise a novel scheme tailored to the particular requirements for on chain storage including minimizing the pessimistic estimate.
The wire format for unicode text should declare the expected character set in addition to the encoding. See clarity-lang/reference#19 for motivation.
This is a breaking change to the existing wire format of transactions.
Why does this have to be a breaking change?
This is a breaking change to the existing wire format of transactions.
Why does this have to be a breaking change?
It is a breaking change in the wire format. As in it will cause unexpected behaviour in apps that try to decode the transactions due to the addition of 2 new types.
As in it will cause unexpected behaviour in apps
@yknl Why does adding the two new types have to cause unexpected behavior in apps? Does it perhaps have to do with how the data is encoded in the wire format?
Clarity values are serialized with a 1-byte type ID prefix: https://github.com/blockstack/stacks-blockchain/blob/master/sip/sip-005-blocks-and-transactions.md#clarity-value-representation
Two additional Clarity value types need to be added to support the change on the blockchain. This is a breaking change to the existing wire format of transactions.
Related PR: https://github.com/blockstack/stacks-blockchain/pull/1779