Closed alanshaw closed 3 years ago
Hi, I would like to help with this issue.
I read the decoding algorithm and it says "If it's 34 bytes long with the leading bytes [0x12, 0x20, ...], it's a CIDv0". Looking at the tests, all the inputs for CIDv0 start with Qm and are 46 characters long.
Could you provide me a valid CIDv0 that doesn't start with Qm and/or is not 46 characters long? So I can write some unit tests and try to solve this issue.
This is for case (1) of https://github.com/multiformats/cid#decoding-algorithm where the CID is a string.
Sorry, maybe I didn't understand the issue.
You said we should check if the string begin with Qm and is 46 chars long. If we add that check and the string doesn't pass, what should we do?
Follow the part of the spec that says "Otherwise, decode it according to the multibase spec..."
As I understand it, a cid can be v0 if:
That's why I'm asking for a cid example that fullfils the second clause. I want to write a unit test with that cid as an input
> new CID('QmV88khHDJEXi7wo6o972MZWY661R9PhrZW6dvpFP6jnMn').multihash.join(',')
'18,32,100,204,247,215,142,116,181,80,84,227,13,252,171,193,40,208,244,156,33,84,136,182,185,131,50,55,21,110,224,183,153,169'
> new CID('QmV88khHDJEXi7wo6o972MZWY661R9PhrZW6dvpFP6jnMn').multihash.length
34
> new CID('QmV88khHDJEXi7wo6o972MZWY661R9PhrZW6dvpFP6jnMn').toString()
'QmV88khHDJEXi7wo6o972MZWY661R9PhrZW6dvpFP6jnMn'
> new CID('QmV88khHDJEXi7wo6o972MZWY661R9PhrZW6dvpFP6jnMn').toString().length
46
v0 is encoded in byte form as a pure multihash, hence that rule #2. v1 adds two varint's at the front of the byte form.
I'm doing byte-decoding of CIDs for my JS CAR file format library which handles this: https://github.com/rvagg/js-datastore-car/blob/9eb82a9c88ac4051f8c93901da366f82f910e49c/lib/coding-browser.js#L66-L71 Some Go code doing same thing: https://github.com/ipfs/go-car/blob/f188c0e24291401335b90626aad4ba562948b525/util/util.go#L23-L26
I don't think this issue is calling for this second case, it's just about being more strict with checking the first two characters rather than jumping straight into a straight base58 decode.
Although, it would have been great to be able to call into js-cid for the decode I did in js-datastore-car, go-car has the same problem where it's doing work that should be exposed by go-cid. "Here's a byte array, it could be v0 or v1, figure it out and give me a CID". The problem we have with the CAR format (and likely other forms of on-the-wire data), is that we don't have a neat bundle of bytes to check, you have a stream that you have to chew at, and test as you go, it could be 32 bytes, or it could be a lot longer with some crazy-long hash, but you'll find out when you: figure out if it's not a v0 so has some varints at the front that will tell you about the length.
This should no longer be an issue due to the way decoding is now handled. Closing this.
https://github.com/multiformats/cid#decoding-algorithm
When decoding a CIDv0 string we just try to decode with
base58btc
, when we should first check the string begins withQm
and is 46 characters long.