ipfs / go-cid

Content ID v1 implemented in go
MIT License
157 stars 47 forks source link

Provide an efficient API to check whether a CID has `IDENTITY` multihash code #133

Closed masih closed 3 years ago

masih commented 3 years ago

CIDs with multihash code IDENTITY typically require special handling when encountered in blockstores. This is because, such CIDs contain the data within themselves; the data is simply the multihash digest of that CID, since multihash code IDENTITY corresponds to copy hash function.

To handle them gracefully checks are needed to indicate whether a given CID has IDENTITY code or not, and checks would have to run for almost all operations on blockstore API. It is therefore, highly desirable to check as efficiently as possible.

The current APIs offered provide two ways to perform the check:

  1. cid.Prefix().MhType
  2. decode of cid.Hash() via go-multihash API to extract the code

Blockstore implementations would benefit from an API that checks whether a given CID or digest of a CID has IDENTITY code in a "fail-fast" manner. This is where the check would return as fast as possible if a CID is not an IDENTITY without checking for the validity of the CID first, then decoding digest, then comparing multihash code.

The rationale for a "fail-fast" check is:

  1. if a CID does not have IDENTITY multihash code, it doesn't always need to be fully decoded in order for a block to be returned (e.g. when CID is used as key in a map)
  2. the majority of CIDs interacted with are not IDENTITY therefore we want to pay the price of decoding only when we have to, and certainly not for every call to blockstore.

I therefore propose to:

welcome[bot] commented 3 years ago

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

Finally, remember to use https://discuss.ipfs.io if you just need general support.

masih commented 3 years ago

As shown by benchmarks in #134 the gains in comparison with using the existing Cid.Prefix are small.

This means that users who wish to check for IDENTITY should use Cid.Prefix since it is more efficient than multihash.Decode.

masih commented 3 years ago

To document the efficiency of existing APIs for IDENTITY check, benchmarks are added in #135