Closed skorokithakis closed 6 years ago
Hi @skorokithakis, I checked the implementation. In this case, we default to checking if this is a valid CIDv0
string or not.
And since CIDv0
is base58 encoded and the alphabets that are legal values are 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
, these strings are a valid base58-encoded CIDs, unless I am missing something here.
Does that makes sense? If you however do something like cid.is_cid("!")
you should see a ValueError
, this is a ValueError
because it is not handled yet. I should be handling this, I will fix this and make a release.
Ah, I see, so basically it's more generic than I thought. I wanted to check that the hashes were valid IPFS hashes, Do you know if there is there any less general check I could run to further ensure that they're IPFS-generated? I'm looking for a validation function that will be as strict as possible while still covering all IPFS-generated hashes.
I am not very sure about how helpful this will be, but we have the function that makes a CID object from a string and it fails under certain scenarios, maybe a helper function on top of this should help, I am not sure.
This is the function, the documentation mentions the scenarios in which it raises a ValueError
- https://py-cid.readthedocs.io/en/stable/api_reference.html#cid.make_cid
That helps, thanks!
Cool, you are welcome!
This issue should probably be reopened, as it looks like it's actually buggy. For example, CIDs should b e more than 3 bytes long, and this tool shows that many hashes that py-cid
considers valid are actually invalid. This looks like something that would be very handy for a test suite (ie valid vs invalid CIDs).
CCing @lidel here.
@dhruvbaldawa I am afraid this is still broken, as noted by @skorokithakis:
a
is not a valid CID, is_cid('a')
should return False
make_cid('a')
should throw error that it is impossible to have multihash shorter than 3 bytes, just like go-ipfs and http://cid-utils.ipfs.team do (the latter uses https://github.com/ipld/js-cid internally)Sadly I don't have free bandwidth to implement fixes, but I've added tests in https://github.com/ipld/py-cid/pull/20 with valid and invalid CID samples that should help with identifying what is missing. Hope it helps :)
@lidel thanks this should help.. I am trying to understand the js implementation and see what we are missing here
@dhruvbaldawa js-cid is a good prior art, but if something is unclear the source of truth is https://github.com/ipld/cid#decoding-algorithm
Yes, I went through the code and realized that I am missing one key part here, which is this - https://github.com/ipld/js-cid/blob/master/src/cid-util.js#L32
Unfortunately, the current multihash library does not have validate, so I am integrating another library
Update: It will take some time for me to completely resolve the issue, I have most of the things fixed but few tests are failing because of a bug in the implementation of multibase
, so working on fixing that.
Ah okay, thank you for the heads up!
@skorokithakis i have released a new version today, can you please check?
I'm trying, unfortunately the py-multiformats
packages are being way too strict and requiring six==1.10.0
which breaks my package manager, as another project is requiring 1.11.0, so nothing can be installed... I'll get back to you if I manage the installation, thank you.
cid.is_cid("hash")
crashes, unfortunately.
I am sorry, I have incorporated your hypothesis tests and also added a bunch of other fixes in related multiformats repos, I can see hypothesis consistently passing tests, can you have a look?
I have made another release, you can just install it and verify.. the requirements have also been relaxed
Yep, everything works, thank you!
Is this supposed to almost always return
True
? I was instructed by people in the IRC channel to file an issue.