Support AVM7 and onward

geo2a commented 2 years ago

This issue will maintain a catch-all list for the opcodes not-yet-supported by the semantics.

Reference implementation and tests

The reference implementation of the opcodes can be looked up in https://github.com/algorand/go-algorand/blob/master/data/transactions/logic/opcodes.go

Another point of reference is https://github.com/algorand/go-algorand/blob/master/data/transactions/logic/assembler_test.go --- the assembler tests

See frame pointers (AVM7?): https://github.com/algorand/go-algorand/issues/4304
Box storage (AVM8): https://github.com/algorand/go-algorand/issues/3890
- See also the PyTeal support PR: https://github.com/algorand/pyteal/issues/300

nwatson22 commented 2 years ago

I looked into the built-in support for JSON parsing in K, relevant to the json_ref opcode that was added in version 7. It seems fairly limited for our use case. We have 2 functions, JSON2String and String2JSON, which convert to/from strings and JSON K objects (structured representation). These are both implemented as C++ hooks. This means that I'm not sure if there is any way to effectively use it with the Haskell backend, since if there is a JSON string on the stack (bytes type), and we try to convert it to a string and then to a JSON object, we are just left with the non-simplifying term String2JSON(Bytes2String(b"json text here")) With the LLVM backend, we can use these hooks, but there is an additional limitation in that I can't think of a good way to deal with the failure case. If the input string is not valid JSON, String2JSON calls abort() which causes there to be no output, so I don't see an easy way to make KAVM handle this case cleanly. Perhaps we could expand the hooks to include an isValidJSON() function.

nwatson22 commented 2 years ago

AVM 7

`json_ref`

Implementation begun in 153

`replace2` and `replace3`

Should be trivial, can be reduced to replace

`base64_decode`, `ed25519verify_bare`, `sha3_256`, and `vrf_verify`

Cryptographic opcodes

`block`

Queries information about individual blocks, i.e. their timestamp or seed. This cannot be implemented here as we are not modeling block creation.

AVM 8

`bury`, `popn`, and `dupn`

These should be simple to implement since they're basic stack manipulation opcodes.

Callstack opcodes: `proto`, `frame_dig`, and `frame_bury`

These are meant to extend the functionality of callsub and retsub to give AVM a full callstack for subroutines. Whereas our current <callStack> contains only program counters to return to when retsub is called, in AVM 7 the callstack consists of "frames" which contain that plus other data.

proto sets the number of argument values and the number of return values for the current top frame in the callstack.
callsub now initializes a new stackframe at the top of the callstack, and fills it with the next instruction address as well as the current stack height. (I'm guessing this is the frame pointer)
frame_dig and frame_bury are read/write to the stack relative to the frame pointer
retsub now has different behavior depending on whether proto was called since callsub. If not, it behaves as it did pre-v8. If it was, then it consumes the arguments from the stack, leaving just the return values.
switch

This branches to the nth label. This may require us to change the way labels are stored since it is currently a map which I don't think has any specific order, or add another map from index to label.

Box storage opcodes: box_create, box_extract, box_replace, box_del, box_len, box_get, box_put

Box storage is a mapping from accounts x box names (bytes) to bytes, which is stored permanently like global and local storage. Boxes are created and destroyed with box_create and box_del. Doing so modifies the minimum balance of the application account by 2500 + 400 * (n+s), where n is the length of the box name and s is the declared size of the box. Once created these can be read from and written to with box_extract and box_replace, to read parts of the byte string, and box_get and box_put, to read or write the entire byte string at once, if it is small enough to fit on the stack. box_len simply queries the length of the box. This should be easy to implement in KAVM. All we need to do is add an additional cell under <accounts> which is a map from box names to their associated storage. Additionally, AVM8 adds more fields to acct_params_get, including one for the number of boxes owned by an account.

The availability of boxes is a bit more complicated. It seems like there is an additional transaction field for "box references", which is a list of pairs of ints and box names. This is not accessible by txna, but it allows applications to access boxes of other applications. To determine availability we need to aggregate the box reference lists of all the transactions in the group, and this aggregated list determines availability as well as telling us which box name refers to a box owned by which account. One thing I'm not sure of yet is what happens if you try to input two different box references, both with the same name but owned by different applications.

runtimeverification / avm-semantics