Open pchiusano opened 5 years ago
I'm moving this off of M1 (but open to PRs from new contributors!!). The codebase format is already going to be versioned separately, so it's less important that the base58-encoded hashes of that format include Unison version information (and the hash algorithm) now. We could choose to add this info in a later version of the codebase format, or not, since it will be redundant - within a version of the codebase format, all the hashes will be of the same type.
I see this as being more useful when displaying hashes to users and when sharing copy-pastable hashes that can have an unambiguous meaning even as Unison evolves. It might also prove useful in the implementation of the Unison inter-node protocol so we can keep that in mind, too.
Here's a proposed self-describing hash, using multiformats:
<multibase><unison-multicodec-id><unison-version-id><multihash>
Notes:
<multibase>
will just be z
for bitcoin base 58 if we're rendering the hash as text.<multihash>
format is just <hash-algo><hash-len-in-bytes><hash-value>
<unison-multicodec-id>
is added to the community table. This is the only thing that will be added to that table. The idea is we'd like to avoid spamming that community table every time there's a new version of Unison and we don't want that to be a bottleneck for doing releases of Unison.<unison-version-id>
references a Unison application-specific multicodec table. (Initially, the "table" will just have one entry in it, Hash.unisonVersion1 = 1 :: Word8
, just stored in the Unison source itself).I'd be open to a PR for this, I would just edit the Unison.Hash
module. Some implementation notes (assuming the above sounds good):
Hash
type could still be a Hash ByteString
, but those bytes will be:
<multibase><unison-multicodec-id><unison-version-id><multihash>
<multihash>
is just <hash-algo><hash-len-in-bytes><hash-value>
base58
and fromBase58
functions accordingly.Accumulate
instance here to prepend:
<unison-multicodec-id><unison-version-id><hash-algo><hash-len-in-bytes>
to the raw bytes produced by the hash.Will this be a path towards allowing people to seamlessly store all their public Unison code on IPFS?
I'm a huge fan of all these CAS-focused projects, as I think it is absolutely the future, however I think a huge chunk of the benefit is being able to store all public CAS content on a single decentralized network.
Hi, newcomer here! I am also a huge fan of CAS (ipfs and ipld mostly) as I see things, the AST could be represented as an ipld object, linking with other code by CID, we could use an ipld store as backend (ipfs) for ucm to handle burden of storing artefacts. Also, we get code sync and tests results sync for free just by resolving CIDs on ipfs first Todo:
Ipld also defines an archive format that can be used in future as binary format for standalone executables and/or library just by providing an ipld store implementation for ucm
I will probably open an issue for this later for comments :)
Did you get anywhere with this @zipper97412?
I can try to take a stab at this if @zipper97412 is busy.
@pchiusano What's the reason for using a custom unison-multicodec
over dag-cbor
? Filecoin for example uses dag-cbor
. This will give you a lot more interop with existing and future tooling, as dag-cbor
is more or less the preferred multicodec outside of some dag-pb
for files and folders.
See #459