ipld / py-cid

Self-describing content-addressed identifiers for distributed systems implementation in Python
MIT License
36 stars 19 forks source link

is_cid seems to always return True #17

Closed skorokithakis closed 6 years ago

skorokithakis commented 6 years ago
In [3]: cid.is_cid("somestuff")
Out[3]: True

In [4]: cid.is_cid("1")
Out[4]: True

Is this supposed to almost always return True? I was instructed by people in the IRC channel to file an issue.

dhruvbaldawa commented 6 years ago

Hi @skorokithakis, I checked the implementation. In this case, we default to checking if this is a valid CIDv0 string or not.

And since CIDv0 is base58 encoded and the alphabets that are legal values are 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz, these strings are a valid base58-encoded CIDs, unless I am missing something here.

Does that makes sense? If you however do something like cid.is_cid("!") you should see a ValueError, this is a ValueError because it is not handled yet. I should be handling this, I will fix this and make a release.

skorokithakis commented 6 years ago

Ah, I see, so basically it's more generic than I thought. I wanted to check that the hashes were valid IPFS hashes, Do you know if there is there any less general check I could run to further ensure that they're IPFS-generated? I'm looking for a validation function that will be as strict as possible while still covering all IPFS-generated hashes.

dhruvbaldawa commented 6 years ago

I am not very sure about how helpful this will be, but we have the function that makes a CID object from a string and it fails under certain scenarios, maybe a helper function on top of this should help, I am not sure.

This is the function, the documentation mentions the scenarios in which it raises a ValueError - https://py-cid.readthedocs.io/en/stable/api_reference.html#cid.make_cid

skorokithakis commented 6 years ago

That helps, thanks!

dhruvbaldawa commented 6 years ago

Cool, you are welcome!

skorokithakis commented 6 years ago

This issue should probably be reopened, as it looks like it's actually buggy. For example, CIDs should b e more than 3 bytes long, and this tool shows that many hashes that py-cid considers valid are actually invalid. This looks like something that would be very handy for a test suite (ie valid vs invalid CIDs).

skorokithakis commented 6 years ago

CCing @lidel here.

lidel commented 6 years ago

@dhruvbaldawa I am afraid this is still broken, as noted by @skorokithakis:

Sadly I don't have free bandwidth to implement fixes, but I've added tests in https://github.com/ipld/py-cid/pull/20 with valid and invalid CID samples that should help with identifying what is missing. Hope it helps :)

dhruvbaldawa commented 6 years ago

@lidel thanks this should help.. I am trying to understand the js implementation and see what we are missing here

lidel commented 6 years ago

@dhruvbaldawa js-cid is a good prior art, but if something is unclear the source of truth is https://github.com/ipld/cid#decoding-algorithm

dhruvbaldawa commented 6 years ago

Yes, I went through the code and realized that I am missing one key part here, which is this - https://github.com/ipld/js-cid/blob/master/src/cid-util.js#L32

Unfortunately, the current multihash library does not have validate, so I am integrating another library

dhruvbaldawa commented 6 years ago

Update: It will take some time for me to completely resolve the issue, I have most of the things fixed but few tests are failing because of a bug in the implementation of multibase, so working on fixing that.

skorokithakis commented 6 years ago

Ah okay, thank you for the heads up!

dhruvbaldawa commented 6 years ago

@skorokithakis i have released a new version today, can you please check?

skorokithakis commented 6 years ago

I'm trying, unfortunately the py-multiformats packages are being way too strict and requiring six==1.10.0 which breaks my package manager, as another project is requiring 1.11.0, so nothing can be installed... I'll get back to you if I manage the installation, thank you.

skorokithakis commented 6 years ago

cid.is_cid("hash") crashes, unfortunately.

dhruvbaldawa commented 6 years ago

I am sorry, I have incorporated your hypothesis tests and also added a bunch of other fixes in related multiformats repos, I can see hypothesis consistently passing tests, can you have a look?

dhruvbaldawa commented 6 years ago

I have made another release, you can just install it and verify.. the requirements have also been relaxed

skorokithakis commented 6 years ago

Yep, everything works, thank you!