explorer: Account and instruction deserialization and labelling is manual

jstarry commented 3 years ago

Problem

Account and instruction deserialization and labelling is done manually and is not scalable to all community created programs

Proposed Solution

Leverage an IDL spec to automatically label and deserialize account state and instruction params
Review the IDL under development in https://github.com/project-serum/anchor

jstarry commented 3 years ago

@armaniferrante could you share the status of the IDL you've developed as part of Anchor and whether you think it would be a good solution to this issue?

armaniferrante commented 3 years ago

The anchor IDL is defined by the JSON serialization of the struct here. Some examples can be found here. It's currently used to generate clients with @project-serum/anchor.

It could definitely be used for this issue. To start, one might want to trim down the IDL linked above (which has some anchor specific concepts like events, state, and errors) and begin with the basics, e.g., something like,

pub struct Idl {
    pub version: String,
    pub name: String,
    pub instructions: Vec<IdlIx>,
    pub accounts: Vec<IdlTypeDef>,
    pub types: Vec<IdlTypeDef>,
}

Other than bike shedding the nitty details of the exact JSON format, I think the main thing missing for this issue would be a serialization format field, because currently the IDL assumes everything is borsh serialized. Depending on what current programs on Solana are doing, we might want to allow other formats like bincode. The main challenge that comes to mind is what to do about programs with custom serialization, like the Serum DEX.

Additionally for this issue, it's important that IDLs live at some deterministic address on chain, so that apps like the explorer can query the IDL with nothing but the program ID and make sense of the instruction data. Anchor uses a PDA with fixed seeds for this (the macro codegen bakes into the program some extra instructions to do this), but this of course doesn't work for non anchor programs, so there probably needs to be some type of associated idl program for this, as I've briefly discussed with @bartosz-lipinski. (Fwiw I think my ideal solution would be to bake the IDL into the bytecode itself, using something like a custom section in .wasm files. But I'm not sure if that's possible with BPF and so the associated idl program is probably the most realistic solution. Edit. I agree with @jstarry's comment below that baking into the program is a bad idea as it would make deserializing transactions to old versions of the program more difficult.)

jstarry commented 3 years ago

Thanks @armaniferrante for the details!

+1 to a serialization format field

Starting with borsh support sounds like a good start. Not sure the best route for bincode / custom deserialization. Could leverage wasm for this?

I like the idea of a deterministic place for IDL's! If we included it in the bytecode, then we might lose the ability to deserialize historical transactions whenever a program is upgraded. Because of upgrades, I think the seed should include both the program id as well as the program's last upgraded slot.

oJshua commented 3 years ago

Tracking this issue. @bartosz-lipinski and I have also been discussing an associated IDL / ABI-type program. This would help our Explorer efforts immensely. We've also considered including some additional metadata e.g. project name, github url.

armaniferrante commented 3 years ago

Some additional details that may or may not be relevant.

Anchor uses sighash based method dispatch (rather than enum based dispatch). So anchor instruction data looks like sha256(rust-ix-struct-ident)[..8] || borsh(rust-ix-struct). Sighash is important to be able to support features like program interfaces (example definition and impl), where a program wants to call another program, without assuming anything other than it implements the interface. This isn't possible with enum based dispatch, since there may be a collision in the enum variant discriminator. Anchor addresses this with sighash by namespacing methods, e.g., by prefacing the #[interface] trait name in the sha256 pre-image).
To support IDL versioning, we may also want to adopt a convention to prepend a version number to the instruction data. So that the explorer can look at at transaction, see the version, fetch the idl with the version as one of the seeds, and then decode the data. This poses a problem: if such a versioning scheme would be embedded into all new instructions, how would the explorer be able to tell the difference between the old world and new world? One way would be prepend a magic number prefix, signalling new world. Not sure if there are others.

coudron commented 3 years ago

I was playing around with a crude implementation of this feature for Anchor IDLs. It doesn't take into consideration non-anchor IDLs, upgraded programs, etc.

Just doing a basic check to see if there's an Anchor IDL for the account's owner, if so decode and display the info in a tab in the details section.

Curious what you all think of this as a first step?

PR here: https://github.com/solana-labs/solana/pull/20745

ngundotra commented 2 years ago

Existing Anchor IDLs are used to deserialize account data in the explorer (as of #23972 and #24239).

Would love to revive this conversation and set out some concrete steps to go further.

Low hanging fruit:

How should we differentiate between known program names & program names deserialized from the Anchor IDL?
How should we display decoded Anchor events found in the program logs?

@jstarry @armaniferrante @oJshua @bartosz-lipinski

solana-labs / solana

explorer: Account and instruction deserialization and labelling is manual #16180

Problem

Proposed Solution