Stebalien commented 6 years ago

The resolver seems to assume that paths always resolve to CIDs (can't end up in the middle of an object).
We should provide a function for retrieving the namespace.
It should be possible to distinguish between mutable and immutable paths.
coreapi.ParsePath doesn't actually extract the root.
Resolved paths should probably use a different type.

Stebalien commented 6 years ago

Note: this also looks like a nice place to plugin a real namespaced resolver system (yay).

Stebalien commented 6 years ago

The resolver seems to assume that paths always resolve to CIDs (can't end up in the middle of an object).

magik6k commented 6 years ago

I haven't touched paths in CoreAPI yet, I'll try to make a draft PR tomorrow, addressing these and some other points.

Stebalien commented 6 years ago

Awesome!

Stebalien commented 6 years ago

Response to TODOs in #4672 but contains a lot if discussion on future work so I figured I'd put it here.

Namespaces

Figure out default namespaces

Getting this right is really important. I can see a few solutions but we need to discuss this thoroughly.

CID Namespace

That is, we could introduce a new /cid/ namespace that defines no path resolution algorithm. /cid/QmId would be valid but /cid/QmId/path would not. I think this is fine as I don't think QmId/path (no namespace) is permitted.

Not sure if I'm a fan of this...

Pro: Guess nothing.
Con: Doesn't handle IPNS.

IPLD Namespace

Again, given that we're not doing any path resolution, the namespace doesn't really matter (that much). IPLD is the most general purpose as all valid UnixFS objects are IPLD objects.

This would mean that the namespace is only concerned with path resolution and shouldn't be used to interpret the target value.

Pro: It allows one to, e.g., link to an IPFS file from IPLD and address it with /ipld/QmId/ipld/path/to/thing.
Pro: Very predictable.
Con: I'd kind of like ipfs cat /ipld/QmId/path/to/thing to return a CAR of the target dag. However, I'm fine with this not working.
Con: Could end up leaking /ipld/QmId to the user when they expect /ipfs/QmId (nasty).
Con: Doesn't handle IPNS.

Command Chooses

We could make the default namespace an option to ParsePath, yielding an error if no default namespace was specified and ParsePath was passed a raw "thing".

Alternatively, we could combine the IPLD namespace solution and this solution. If no default is passed, we default to IPLD.

We could even make this option a function with the signature func(string) Path. That would allow maximum flexibility (and allow the function to, possibly, spit out a warning).

Personally, I'd vote for this option (with the fallback to IPLD) as it has all the advantages of the IPLD option but is slightly more flexible.

Path Remainders

We may just want to set a "resolved" bit somewhere and use that to determine if the path has been resolved. We also need to address https://github.com/ipld/specs/issues/3

Mutable v. Immutable

We may, eventually, have other mutable namespaces (maybe?). I'd like to just provide a method Mutable(). Internally everything may just return false except IPNS but I don't think it'll cost us much and may make extending things easier in the future.

Resolution based on prefix

I'd really just like a global path resolution registry that maps namespaces to resolvers. But yeah, this can come later.

MFS

Ideally, I'd like to introduce the concept of "mounting" all of our namespaces inside MFS (although I haven't thought through all of the ramifications). Then, we'd just hand the path off to MFS and let it do the lookup. Although, we could also just have MFS "mount" all namespaces (i.e., mount the resolver).

(If I could break things, I would introduce /local/{ipfs,ipld}/ namespaces (or maybe just /local) to scope off all non-globally addressable stuff but... oh well.)

So... Let the debate begin!

magik6k commented 6 years ago

Having the command choose is something that should work fine - we can still give the user an option to choose which resolver to use if we add another Resolve[..] function, eliminating the need for ~~/ipns~~/ipld namespace:

ParsePath(string) just parses a string to a generic path
ResolveFile(Path) resolves a path using unixfs resolver
ResolveData(Path) resolves a path using ipld resolver
When a path is passed to a command and if it's not resolved, the command resolves it using resolver making most sense in it's context. If it's resolved it just uses what it got
ResolveFile can work in MFS context.
Can allow mixing resolvers, which may be useful in some scenarios:
- Parse(unixfspath) -> ResolveFile -> AppendPath(Path, strpath) -> ResolveData
- Just a random thought, may not do that if this is seen as too ugly

Pros:

Don't need more namespaces
Works with ipns
No hacks needed
Can map nicely to MFS

A con here is that paths can represent 2 different things depending on the resolver/command used.

Can be slightly confusing for users
~~I see this as a less of as issue than having separate /ipld namespace which support with ipns~~ I see this as a less of an issue than having separate /ipld namespace without support for ipns

(edit: cleaned up some wordings)

Stebalien commented 6 years ago

ResolveFile(Path) resolves a path using unixfs resolver.

Having two separate functions, one for IPFS and one for IPLD, both of them taking a path seems a bit odd. Won't the path have the namespace so shouldn't the choice be unambiguous at this point? Also, I'd like the resolver to be fully pluggable. Ideally, the top-level resolver wouldn't understand any of our pathing schemes, just the concept of pathing/namespaces (and maybe CIDs because they'll be baked into most of our pathing schemes).

eliminating the need for /ipns namespace.

I must be missing something here. We do not want to get rid of the /ipns namespace.

Don't need more namespaces.

I agree that more namespaces isn't the way to go. I mentioned the /cid namespace more for completeness. IMO, that's the "technically correct" approach but not really a usable approach in practice.

No hacks needed

I don't consider any of the solutions I provided to be hacks (well, the /cid namespace one is a bit funky but it's still consistent and well defined).

I see this as a less of as issue than having separate /ipld namespace which support with ipns

I'm not sure if I understand your point here. We plan on having a /ipld namespace regardless.

However, I guess I never really explained my concern with regard to IPNS... The issue with IPNS is that, given ParsePath("QmId..."), it's impossible to determine if QmId is a peer ID (IPNS key) or a v0 CID. All of my proposed schemes support resolving IPNS, they just (except for the last one) run into issues when asked to resolve QmId... (due to the lack of a namespace).

magik6k commented 6 years ago

eliminating the need for /ipns namespace.

I must be missing something here. We do not want to get rid of the /ipns namespace.

Oh, I meant /ipld, sorry for the confusion, typo..

The point is was that with /ipld we _{can't easily resolve IPNS names_{using _{IPLD_resolver.}}} So I dug into IPNS code to find that it works on path level, not on CIDs, this changes my perspective a bit.

However, I guess I never really explained my concern with regard to IPNS... The issue with IPNS is that, given ParsePath("QmId..."), it's impossible to determine if QmId is a peer ID (IPNS key) or a v0 CID. All of my proposed schemes support resolving IPNS, they just (except for the last one) run into issues when asked to resolve QmId... (due to the lack of a namespace).

The current default is to use /ipfs namespace by default. I think this is fine as long as this is documented. This is the behaviour of most of the commands (except ipfs name resolve..).

With the ability to publish /ipld paths to IPNS I think it's fine to go that way. To address some concerns from the comment above:

Could end up leaking /ipld/QmId to the user when they expect /ipfs/QmId (nasty).

The only API returning /ipld prefix would be the Dag API. If we keep the boundary between (linked-)data and files clearly defined, this shouldn't become a problem.

Con: Doesn't handle IPNS.

Well, IPNS can handle /ipld paths (it will need some changes to make it work, but it can on the protocol level). Only problem I can see here is that /ipns paths can get resolved in 2 different ways depending on what they resolve to, though I can't think of any example use-case that would be affected by that.

Stebalien commented 6 years ago

So... I omitted a ton of context in that proposal.

The current default is to use /ipfs namespace by default. I think this is fine as long as this is documented.

This becomes a problem with all the dag and block commands that usually operate over IPLD dags. This is actually a really nasty issue that has bitten us multiple times (mostly because we haven't yet implemented the /ipld namespace).

However, as the "IPLD namespace" proposal pointed out, we never have paths of the form QmId/path, only /namespace/QmId/path and QmId (no path). Therefore, we don't care about the path resolution algorithm except where IPNS is concerned (because, in this case, the QmId is not a CID). Therefore, the choice of IPLD/IPFS isn't really important when unspecified (as long as we're not dealing with IPNS). That proposal suggested /ipld because /ipld is more general purpose. However, at the end of the day, it doesn't matter because. Internally, the resolver won't "walk the path" because there will be no path (so either way will work).

Could end up leaking /ipld/QmId to the user when they expect /ipfs/QmId (nasty).

The only API returning /ipld prefix would be the Dag API. If we keep the boundary between (linked-)data and files clearly defined, this shouldn't become a problem.

My concern, in this case, was that some error message might state /ipld/QmId and confuse the user (even though the choice of /ipld or /ipfs doesn't really matter in this case). That is, in the "no path so we don't care the which path resolution algorithm we use" case, if we choose /ipld when the user expects /ipfs, some error message may tell the user that we tried to resolve /ipld/QmId and confuse them.

Note: Simply choosing /ipfs/QmId doesn't actually fix this as we'll have the reverse problems for commands like ipfs dag etc...

Con: Doesn't handle IPNS.

Well, IPNS can handle /ipld paths (it will need some changes to make it work, but it can on the protocol level).

All my comments about "doesn't handle IPNS" were referring to the fact that we can't determine id QmId... is an IPNS key or a CID and the correct interpretation depends on the command. The two proposals with the "doesn't handle IPNS" caveat didn't provide a way to specify a default namespce (no way for the command to tell the resolver "if you're passed a path in the form QmId..., use namespace X".

magik6k commented 6 years ago

All my comments about "doesn't handle IPNS" were referring to the fact that we can't determine id QmId... is an IPNS key or a CID and the correct interpretation depends on the command. The two proposals with the "doesn't handle IPNS" caveat didn't provide a way to specify a default namespce (no way for the command to tell the resolver "if you're passed a path in the form QmId..., use namespace X".

Currently you can't create a path without a prefix using CoreApi [1].

name.Resolve, which is afaik the only function which by-default prefixes things with /ipns/, currently is passed a string instead of a Path (on a second thought I'm not sure if it was a right decision).

You can't pass unprefixed path to any coreapi function. When porting the commands to use coreapi, the default prefix will have to be attached in the command layer. If we enforce this, then the QmId... shouldn't be a problem.

Stebalien commented 6 years ago

Currently you can't create a path without a prefix using CoreApi [1].

But you can call ParsePath("QmId...") (no prefix, just a string).

magik6k commented 6 years ago

But you can call ParsePath("QmId...") (no prefix, just a string).

Yes, the current default is to parse this as a CID and prefix with /ipfs/. We can make it return errors for unprefixed strings. This would give us the flexibility to, in future, interpret the paths in context of MFS without breaking the compatibility too badly.

magik6k commented 6 years ago

To push this somewhere:

This is how I see the /ipld variant:

ParseCid prefixes paths with /ipfs
ParsePath rejects unprefixed paths
dag, block apis return /ipld paths
name api returns /ipns paths
other apis return /ipfs paths

By explicitly disallowing unprefixed paths we don't have the ParsePath("QmId...") ambiguity. As this is an CoreAPI-only thing, we can still maintain current command behaviour quite easily.

Problems: MFS is weird in this case. We can either - add /mfs namespace (imo ugly) - or - pass string paths to files api (even uglier?)

But - what if we treated all unprefixed paths as MFS paths? In this case MFS would provide a resolver. We still can have /ipld paths.

Downsides:

More complicated in implementation (meh)
Keeping command compatibility may be harder
- We could check if the path is prefixed with / and if it's not, prepend some default. This is a bit hacky so if we decide to do this, it should be there for a limited time (we should deprecate unprefixed paths in commands as-well)
Any security implications?
Not sure about js-ipfs side, afaik it is not going to have the files api, at least not anytime soon

Stebalien commented 6 years ago

ParseCid prefixes paths with /ipfs

I'm not really comfortable with this as it implies that IPFS is somehow the "correct" namespace. Maybe rename the function to IpfsPath(cid) or NewPath(ns string, cid *cid.Cid)?

But - what if we treated all unprefixed paths as MFS paths? In this case MFS would provide a resolver. We still can have /ipld paths.

I'd almost just go with the "pass strings" option. For MFS, paths really are just strings. Internally, the files API would parse the string into a path and do any necessary resolution.

Personally, I'd have liked to have had a separate /local prefix for mfs but...

magik6k commented 6 years ago

So I did a bit of poking in https://github.com/ipfs/go-ipfs/pull/4672 locally. I think that /local for non-global (mfs) stuff might actually be fine. We aren't breaking anything since coreapi doesn't have files api yet, and when in future the files commands will be rewritten to copeapi we can wrap them to use the /local namespace.

I think I'll go with this and see how it turns out. Also, note that if we would want to use mfs as the global namespace, we should reserve /ipld and /local namespaces asap (it would be a breaking change)

ipfs / kubo

Think through the CoreAPI Path API and implementation. #4666

Namespaces

CID Namespace

IPLD Namespace

Command Chooses

Path Remainders

Mutable v. Immutable

Resolution based on prefix

MFS