runtimeverification / plutus-core-semantics

BSD 3-Clause "New" or "Revised" License
27 stars 5 forks source link

Implement CBOR decoder for Data constants #258

Closed SchmErik closed 2 years ago

SchmErik commented 2 years ago

The plutus data constant has two representations: a textual representation and a CBOR-encoded bytestring format. We currently have an implementation of the former and we need to support the latter. Since CBOR is not readablefor humans, we would like to translate the CBOR bytestring into a textual representation during the return phase of the computation. By doing so, the semantics of builtin functions that operate on data can remain the same.

SchmErik commented 2 years ago

Important links:

CBOR spec: https://www.rfc-editor.org/rfc/rfc8949.pdf IoG's Data implementation: https://github.com/input-output-hk/plutus/blob/master/plutus-core/plutus-core/src/PlutusCore/Data.hs CBOR playground: https://cbor.me/

SchmErik commented 2 years ago

Decoding 0xd87980

0xd8 is a byte consisting of the top 3 bits and the bottom 5 bits.

110 11000 -> 24 type 3 (argument's value is held in the following 1 byte)

type 3 from the spec: A text string (Section 2) encoded as UTF-8 . The number of bytes in the string is equal to the argument. A string containing an invalid UTF-8 sequence is well-formed but invalid (Section 1.2). This type is provided for systems that need to interpret or display human-readable text, and allows the differentiation between unstructured bytes and text that has a specified repertoire (that of Unicode) and encoding (UTF-8). In contrast to formats such as JSON, the Unicode characters in this type are never escaped. Thus, a newline character (U +000A) is always represented in a string as the byte 0x0a, and never as the bytes 0x5c6e (the characters "\" and "n") nor as 0x5c7530303061 (the characters "\", "u", "0", "0", "0", and "a").

0x79 (decimal 121)

This is a tag that represents Contr 0. If this value were to be 0x7A, it would represent Contr 1

0x80

This represents an array with zero elements.

Therefore, d87980 represents the data term Constr 0 []

SchmErik commented 2 years ago

Section D.4

Section D.5

Section D.6

Section D.7

SchmErik commented 2 years ago

Specification has defined the decoder using function notation so this should be a straight translation of the spec to the k functions.