ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Filecoin: initial pass at describing the block forms #329

Closed rvagg closed 3 years ago

rvagg commented 3 years ago

This builds on @willscott's excellent work on describing the basic forms in ipld-prime schemagen lingo @ https://github.com/filecoin-project/statediff/tree/master/types/gen and work I did with https://github.com/rvagg/car-to-schema to reverse-engineer the block forms in Filecoin extracts.

I've run out of steam today and there are 4 actors I haven't included yet.

I also haven't even touched the message types yet, but my understanding is that these are all serialized before being stored in Params in the Messages. So we have a dag-cbor-within-dag-cbor situation with them which kind of makes it difficult to explain the "links" with our schema language!

Lots more supporting information could be included around these schemas but we'd need to figure out what we're trying to do here that doesn't belong in the specs. Or maybe this work should eventually move there.

/cc @Satoshi-Kusumoto who's working on a diagram of the chain I believe and may find this data useful.

warpfork commented 3 years ago

Some partially-completed thoughts coming out of a partial review of this and also some conversation with @willscott today:

I'm excited about all this :)

willscott commented 3 years ago

One nit we're going to have to figure out how to address that we don't encode now is that before some block height (i forget exactly but after 120,000 and before ~160,000) the BlockHeader.ParentStateRoot is a direct &ActorsHAMT rather than a &StateRoot as we encode it. (this is at the point of the v2 actors migration). I don't know if we're going to have a great way to represent that implicit union.

rvagg commented 3 years ago

Chain height being used as a signal for differentiation is going to be a problem? Who could have foreseen this??

Change in the AMT is coming down the pipe too, it's getting a flexible width and may bump to something larger (maybe 32 like the HAMT?) and and the width embedded into the root 🤞. So the versioning will extend to ADLs shortly.

Re mixing of ADL schemas in with the rest - ack on the concern here but what I'd like to balance here is a raw "what am I seeing when decoding these blocks" vs "what IPLD would like you to see at a logical level (which includes the concept of ADLs pretending these blocks are something else entirely)". So, I think more organisation is in order and I've got tons of duplication that could easily go away, but with values embedded in most of the ADLs I don't want to lose that critical information for readers who won't be familiar with the ADL concept (which is going to be difficult to describe here for those readers).

So, first pass is really raw and super verbose. Future passes can either fix this or introduce new versions of the doc that have a simplified version.

rvagg commented 3 years ago

Fleshed out the rest and added a doc for message params. I still don't have a complete grasp on how messages are used but I've added a note that they are encoded as DAG-CBOR and the resulting bytes placed into the Message#Params field.

I don't have direct experience decoding any of these message params but have gone through both v0 and v2 actors code and checked that they're all there.

I don't know about returns though .. do the message return types get encoded anywhere? Do we need to list those too?

Anyway, this is ready for review. Maybe we just get this in as it is an iterate further once it's landed?

willscott commented 3 years ago

re: message returns - they end up in the 'receipts' of the ParentMessageReceipts amt 'Return' bytes field - a second layer of cbor encoding in the same way as params on the way in.

rvagg commented 3 years ago

Finished my v2/v0 check and fixup. I also added in all of the inline docs I could find about some of these structures and fields. I've gone through messages and added returns and added versioned forms too.

I'd like to merge this as it is now. Further iteration can be done on extracting the ADLs creatively. I'll have to think about that - or someone else can. But this is solid as it is I think.

I'll note for the onlooker that the type aliases aren't strictly supported by all our schema tooling, in particular go-ipld-prime doesn't yet have a way to deal with this. This wouldn't quite work if you were to try and wire it up in go-ipld-prime:

type MessageParamsMultisigV2Propose MessageParamsMultisigV0Propose
type MessageReturnMultisigV2Propose MessageReturnMultisigV0Propose

But I've found it very useful for documentation purposes and I'd like to make sure we can get a path to it working. We have a copy type in Schemas but we're discussing ditching that and maybe adding in proper aliases like this.