runtimeverification / plutus-core-semantics

BSD 3-Clause "New" or "Revised" License
27 stars 5 forks source link

Flat format parser #73

Open ChristianoBraga opened 2 years ago

ChristianoBraga commented 2 years ago

A K function that receives a string representing an Untyped Plutus Core (UPLC) program and produces an UPLC program.

SchmErik commented 2 years ago

To make development easier, I here's a list of items that need to be accomplished:

Parsing small version numbers

Parsing variable length data

Parsing error

Parsing constants

Parsing delay and force

Parsing application

Parsing builtins

Parsing variables

Parsing lambda abstractions

SchmErik commented 2 years ago

At the time of writing, all that remains is to implement support for pairs and lists that are encoded in the flat format. Here are the remaining tasks.

  1. properly parse type constants containing list/pair. The current implementation only parses types that aren't parametric. A type in flat format is expressed as a cons list where a single 1 bit indicates a list element followed by the type which spans 4 bits. A single 0 bit indicates the end of a list. Example: the type List(Integer) is encoded as a type application of list to integer. So the encoding is 7 (type application) 5 (list) 0 (integer). This is expressed as 1 (cons) 7 (type application) 1 (cons) 5 (list) 1 (cons) 0 (integer) 0 (nil) where cons and nil span 1 bit and everything else spans 4 bits.
  2. at the time of writing, there is a rule that matches on the type being parsed. Rather than matching explicitly, a constant parsing function should be extracted. This function should take a type constant as a parameter and parse a single constant of that type. For simple types, this would be implemented by the rule branches that match on the type constant to a separate function.
  3. parse the actual pair and list constants (which shouldn't be too difficult after steps 1 and 2). Lists are encoded as cons lists similar to the type constants encoded in step 1. Refer to section E.3.3 and E.3.4 in the July 26th spec for the encoding.