ipfs / ipget

Retrieve files over IPFS and save them locally.
MIT License
390 stars 54 forks source link

Add input cleaner #51

Closed djdv closed 5 years ago

djdv commented 6 years ago

So that input such as http://localhost:8080/ipfs/QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g, https://ipfs.io/ipfs/QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g, etc. may be passed in and turned into proper references (/ipfs/QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g) If nothing is found the input string is returned so that it may pass or fail by path.ParsePath later.

Blocked by: https://github.com/ipfs/ipget/pull/50 submitting now so I don't forget.

djdv commented 5 years ago

Specifically I originally wanted this to do this: https://github.com/djdv/go-ipfs/commit/0d250ccd097c949f60930eeefb0cd197e20dac24

djdv commented 5 years ago

Pinging @Stebalien for opinions on this. I feel like it's more likely that people are going to have url's on their clipboard and in their scripts than bare canonical paths. They should probably be supported as valid input.

Stebalien commented 5 years ago

I'm really wary of stuff like this. What if we were to match URLs only? That is, check:

  1. Does it start with /.
  2. Does it start with a CID.
  3. Does it start with https?://foo.com/(ipfs|ipns)/...?
djdv commented 5 years ago

I changed this so that we first try to use go-path's parser which should handle 1 & 2, and return early. If that fails, we check if the path looks like a URL and pass the found string to go-path again.

Regex changed ^.*(/ip(f|n)s/.+)$ (group anything in the string that looks like an ipfs/ipns path) -> ^https?.*(/(ipfs|ipns|ipld)/.+)$ (specifically try to clamp to HTTP gateways, added IPLD)

Added a simple test as well. (Ignore the force push spam.)

djdv commented 5 years ago

The CI was not happy about the lack of execute permission on the test script file. This should be good for real now (assuming it passes).

Edit: I lied I left an additional test in there by mistake

djdv commented 5 years ago

~~This may be a better variant of the regex string. ^https?://.*(/(?:ipfs|ipns|ipld)/.{46}(?:/.+)?$) match HTTP, check for {/namespace/}, {fixed length for cid} {optional subpath}~~

If that's too specific ^https?.*(/(ipfs|ipns|ipld)/.+)$ should still probably be changed to ^https?://.*(/(?:ipfs|ipns|ipld)/.+)$ (add ://, and don't capture the namespace, we have no need for that) Edit: pushed the latter, waiting for judgment on the former. I'm not certain about the length of CID's being fixed. Double edit: I forgot it doesn't matter since we're passing it to go-path anyway which will error out if it is a false positive. Too much caffeine.

djdv commented 5 years ago

What if we just parsed the input as a URL and took the path?

Much better!

I also added the handling for these https://github.com/ipfs-shipyard/ipfs-companion/pull/533 ipfs://QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g and it seems fine. But I'm not sure if this is wanted or not. (revertible)

Edit: Forced over this: https://github.com/ipfs/ipget/commit/0af4734fc89a60c3c01de7c43c81b99ccb0d0db0#diff-b9cfc7f2cdf78a7f4b91a753d10865a2