ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.18k stars 3.02k forks source link

Think through the CoreAPI Path API and implementation. #4666

Closed Stebalien closed 6 years ago

Stebalien commented 6 years ago
Stebalien commented 6 years ago

Note: this also looks like a nice place to plugin a real namespaced resolver system (yay).

Stebalien commented 6 years ago

The resolver seems to assume that paths always resolve to CIDs (can't end up in the middle of an object).

Related to: https://github.com/ipld/specs/issues/3

magik6k commented 6 years ago

Somewhat related https://github.com/ipfs/go-ipfs/pull/4643#discussion_r165768700

I haven't touched paths in CoreAPI yet, I'll try to make a draft PR tomorrow, addressing these and some other points.

Stebalien commented 6 years ago

Awesome!

Stebalien commented 6 years ago

Response to TODOs in #4672 but contains a lot if discussion on future work so I figured I'd put it here.

Namespaces

Figure out default namespaces

Getting this right is really important. I can see a few solutions but we need to discuss this thoroughly.

CID Namespace

That is, we could introduce a new /cid/ namespace that defines no path resolution algorithm. /cid/QmId would be valid but /cid/QmId/path would not. I think this is fine as I don't think QmId/path (no namespace) is permitted.

Not sure if I'm a fan of this...

IPLD Namespace

Again, given that we're not doing any path resolution, the namespace doesn't really matter (that much). IPLD is the most general purpose as all valid UnixFS objects are IPLD objects.

This would mean that the namespace is only concerned with path resolution and shouldn't be used to interpret the target value.

Command Chooses

We could make the default namespace an option to ParsePath, yielding an error if no default namespace was specified and ParsePath was passed a raw "thing".

Alternatively, we could combine the IPLD namespace solution and this solution. If no default is passed, we default to IPLD.

We could even make this option a function with the signature func(string) Path. That would allow maximum flexibility (and allow the function to, possibly, spit out a warning).

Personally, I'd vote for this option (with the fallback to IPLD) as it has all the advantages of the IPLD option but is slightly more flexible.

Path Remainders

We may just want to set a "resolved" bit somewhere and use that to determine if the path has been resolved. We also need to address https://github.com/ipld/specs/issues/3

Mutable v. Immutable

We may, eventually, have other mutable namespaces (maybe?). I'd like to just provide a method Mutable(). Internally everything may just return false except IPNS but I don't think it'll cost us much and may make extending things easier in the future.

Resolution based on prefix

I'd really just like a global path resolution registry that maps namespaces to resolvers. But yeah, this can come later.

MFS

Ideally, I'd like to introduce the concept of "mounting" all of our namespaces inside MFS (although I haven't thought through all of the ramifications). Then, we'd just hand the path off to MFS and let it do the lookup. Although, we could also just have MFS "mount" all namespaces (i.e., mount the resolver).

(If I could break things, I would introduce /local/{ipfs,ipld}/ namespaces (or maybe just /local) to scope off all non-globally addressable stuff but... oh well.)


So... Let the debate begin!

magik6k commented 6 years ago

Having the command choose is something that should work fine - we can still give the user an option to choose which resolver to use if we add another Resolve[..] function, eliminating the need for /ipns/ipld namespace:

Pros:

A con here is that paths can represent 2 different things depending on the resolver/command used.

(edit: cleaned up some wordings)

Stebalien commented 6 years ago

ResolveFile(Path) resolves a path using unixfs resolver.

Having two separate functions, one for IPFS and one for IPLD, both of them taking a path seems a bit odd. Won't the path have the namespace so shouldn't the choice be unambiguous at this point? Also, I'd like the resolver to be fully pluggable. Ideally, the top-level resolver wouldn't understand any of our pathing schemes, just the concept of pathing/namespaces (and maybe CIDs because they'll be baked into most of our pathing schemes).

eliminating the need for /ipns namespace.

I must be missing something here. We do not want to get rid of the /ipns namespace.


Don't need more namespaces.

I agree that more namespaces isn't the way to go. I mentioned the /cid namespace more for completeness. IMO, that's the "technically correct" approach but not really a usable approach in practice.

No hacks needed

I don't consider any of the solutions I provided to be hacks (well, the /cid namespace one is a bit funky but it's still consistent and well defined).

I see this as a less of as issue than having separate /ipld namespace which support with ipns

I'm not sure if I understand your point here. We plan on having a /ipld namespace regardless.

However, I guess I never really explained my concern with regard to IPNS... The issue with IPNS is that, given ParsePath("QmId..."), it's impossible to determine if QmId is a peer ID (IPNS key) or a v0 CID. All of my proposed schemes support resolving IPNS, they just (except for the last one) run into issues when asked to resolve QmId... (due to the lack of a namespace).

magik6k commented 6 years ago

eliminating the need for /ipns namespace.

I must be missing something here. We do not want to get rid of the /ipns namespace.

Oh, I meant /ipld, sorry for the confusion, typo..

The point is was that with /ipld we can't easily resolve IPNS names using IPLD resolver. So I dug into IPNS code to find that it works on path level, not on CIDs, this changes my perspective a bit.

However, I guess I never really explained my concern with regard to IPNS... The issue with IPNS is that, given ParsePath("QmId..."), it's impossible to determine if QmId is a peer ID (IPNS key) or a v0 CID. All of my proposed schemes support resolving IPNS, they just (except for the last one) run into issues when asked to resolve QmId... (due to the lack of a namespace).

The current default is to use /ipfs namespace by default. I think this is fine as long as this is documented. This is the behaviour of most of the commands (except ipfs name resolve..).

With the ability to publish /ipld paths to IPNS I think it's fine to go that way. To address some concerns from the comment above:

Could end up leaking /ipld/QmId to the user when they expect /ipfs/QmId (nasty).

The only API returning /ipld prefix would be the Dag API. If we keep the boundary between (linked-)data and files clearly defined, this shouldn't become a problem.

Con: Doesn't handle IPNS.

Well, IPNS can handle /ipld paths (it will need some changes to make it work, but it can on the protocol level). Only problem I can see here is that /ipns paths can get resolved in 2 different ways depending on what they resolve to, though I can't think of any example use-case that would be affected by that.

Stebalien commented 6 years ago

So... I omitted a ton of context in that proposal.

The current default is to use /ipfs namespace by default. I think this is fine as long as this is documented.

This becomes a problem with all the dag and block commands that usually operate over IPLD dags. This is actually a really nasty issue that has bitten us multiple times (mostly because we haven't yet implemented the /ipld namespace).

However, as the "IPLD namespace" proposal pointed out, we never have paths of the form QmId/path, only /namespace/QmId/path and QmId (no path). Therefore, we don't care about the path resolution algorithm except where IPNS is concerned (because, in this case, the QmId is not a CID). Therefore, the choice of IPLD/IPFS isn't really important when unspecified (as long as we're not dealing with IPNS). That proposal suggested /ipld because /ipld is more general purpose. However, at the end of the day, it doesn't matter because. Internally, the resolver won't "walk the path" because there will be no path (so either way will work).

Could end up leaking /ipld/QmId to the user when they expect /ipfs/QmId (nasty).

The only API returning /ipld prefix would be the Dag API. If we keep the boundary between (linked-)data and files clearly defined, this shouldn't become a problem.

My concern, in this case, was that some error message might state /ipld/QmId and confuse the user (even though the choice of /ipld or /ipfs doesn't really matter in this case). That is, in the "no path so we don't care the which path resolution algorithm we use" case, if we choose /ipld when the user expects /ipfs, some error message may tell the user that we tried to resolve /ipld/QmId and confuse them.

Note: Simply choosing /ipfs/QmId doesn't actually fix this as we'll have the reverse problems for commands like ipfs dag etc...

Con: Doesn't handle IPNS.

Well, IPNS can handle /ipld paths (it will need some changes to make it work, but it can on the protocol level).

All my comments about "doesn't handle IPNS" were referring to the fact that we can't determine id QmId... is an IPNS key or a CID and the correct interpretation depends on the command. The two proposals with the "doesn't handle IPNS" caveat didn't provide a way to specify a default namespce (no way for the command to tell the resolver "if you're passed a path in the form QmId..., use namespace X".

magik6k commented 6 years ago

All my comments about "doesn't handle IPNS" were referring to the fact that we can't determine id QmId... is an IPNS key or a CID and the correct interpretation depends on the command. The two proposals with the "doesn't handle IPNS" caveat didn't provide a way to specify a default namespce (no way for the command to tell the resolver "if you're passed a path in the form QmId..., use namespace X".

Currently you can't create a path without a prefix using CoreApi [1].

name.Resolve, which is afaik the only function which by-default prefixes things with /ipns/, currently is passed a string instead of a Path (on a second thought I'm not sure if it was a right decision).

You can't pass unprefixed path to any coreapi function. When porting the commands to use coreapi, the default prefix will have to be attached in the command layer. If we enforce this, then the QmId... shouldn't be a problem.

Stebalien commented 6 years ago

Currently you can't create a path without a prefix using CoreApi [1].

But you can call ParsePath("QmId...") (no prefix, just a string).

magik6k commented 6 years ago

But you can call ParsePath("QmId...") (no prefix, just a string).

Yes, the current default is to parse this as a CID and prefix with /ipfs/. We can make it return errors for unprefixed strings. This would give us the flexibility to, in future, interpret the paths in context of MFS without breaking the compatibility too badly.

magik6k commented 6 years ago

To push this somewhere:

This is how I see the /ipld variant:

By explicitly disallowing unprefixed paths we don't have the ParsePath("QmId...") ambiguity. As this is an CoreAPI-only thing, we can still maintain current command behaviour quite easily.

Problems: MFS is weird in this case. We can either - add /mfs namespace (imo ugly) - or - pass string paths to files api (even uglier?)

But - what if we treated all unprefixed paths as MFS paths? In this case MFS would provide a resolver. We still can have /ipld paths.

Downsides:

Stebalien commented 6 years ago

ParseCid prefixes paths with /ipfs

I'm not really comfortable with this as it implies that IPFS is somehow the "correct" namespace. Maybe rename the function to IpfsPath(cid) or NewPath(ns string, cid *cid.Cid)?

But - what if we treated all unprefixed paths as MFS paths? In this case MFS would provide a resolver. We still can have /ipld paths.

I'd almost just go with the "pass strings" option. For MFS, paths really are just strings. Internally, the files API would parse the string into a path and do any necessary resolution.

Personally, I'd have liked to have had a separate /local prefix for mfs but...

magik6k commented 6 years ago

So I did a bit of poking in https://github.com/ipfs/go-ipfs/pull/4672 locally. I think that /local for non-global (mfs) stuff might actually be fine. We aren't breaking anything since coreapi doesn't have files api yet, and when in future the files commands will be rewritten to copeapi we can wrap them to use the /local namespace.

I think I'll go with this and see how it turns out. Also, note that if we would want to use mfs as the global namespace, we should reserve /ipld and /local namespaces asap (it would be a breaking change)