ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Slice paths #84

Open Stebalien opened 5 years ago

Stebalien commented 5 years ago

We currently support links to array items but not links to array slices: /ipld/QmID/10..20 (not using 10:20 to be nice to Windows users).

We should seriously consider adding support for these as most modern programming languages support subslicing arrays.

mikeal commented 5 years ago

It would be nice to have a unified syntax for ranges of numeric indexes as well as string indexes when we have ordered collections.

Stebalien commented 5 years ago

Hm. Yeah, that would actually be really nice. It's just tricky to specify the syntax given that we support arbitrary string keys.

mikeal commented 5 years ago

It's not as pretty, but we could do something like /ipld/QmID/:slice(10,20), /ipld/QmID/:slice("start","end")

Stebalien commented 5 years ago

:slice("start","end") is a valid map key (at the moment at least).

We could also try defining an escaping system. Really, that might be best as it would allow us to path through /. However, I'm not sure what we'll want to reserve to support this.

Note: This is getting into the domain of IPLD selectors.

mikeal commented 5 years ago

For practical reasons we should (possibly already do) follow the escaping and reserved character rules of URLs.

; / ? : @ = & are all reserved characters that should be escaped if used in key names. We could do something along the lines of /ipld/QmID?slice=10,20

vmx commented 5 years ago

I like the idea of paths being more like URLs rather than file system paths.

Stebalien commented 5 years ago

Let's be careful not to make this too complicated. We'll have actual IPLD selectors for complicated queries.

I only suggested slice queries for arrays because most programming languages can do this out of the box. That is, I'm trying to make paths as powerful as (but not significantly more than) pointers.

We could do something along the lines of /ipld/QmID?slice=10,20 ... I like the idea of paths being more like URLs rather than file system paths.

It needs to be a path, e.g. /ipld/QmID/something. The whole "everything is a path" concept in ipfs, libp2p, multiaddrs, ipld, etc. is really important (paths are recursive and composable).

For practical reasons we should (possibly already do) follow the escaping and reserved character rules of URLs.

While prevalent, URL escaping:

  1. Is not user/dev friendly at all (%2f?).
  2. Is case insensitive while the rest of IPLD paths are case sensitive.
  3. Will make it harder to interoperate with browsers/web-servers as these tend to unescape things. For example, the default/only go HTTP request router will turn /ipld/QmID/%2fsomething%2f into /ipld/QmID//something/ and then redirect the user to /ipld/QmID/something/ to get rid of the extra slash.

We may still want to go with URL escaping because that's what everyone uses but we should be very careful here (and consider just using a backslash).

For practical reasons we should (possibly already do) follow the escaping and reserved character rules of URLs.

Remember, we have to make this work with UnixFS (which supports all of these special characters except /).

mikeal commented 5 years ago

We may still want to go with URL escaping because that's what everyone uses but we should be very careful here (and consider just using a backslash).

Primary motivation for adopting URL constraints are:

If we were to create our own constraints and escaping we'd have to:

This is obvious outside the scope of how we handle this specific problem (slice paths) but if we have a set of reserved characters already by adopting URL constraints I'd prefer not to reserve additional ones.

Remember, we have to make this work with UnixFS (which supports all of these special characters except /).

You can still use any character for a property name, even reserved characters, they just need to be escaped. This will be true for any additional characters we'd want to reserve for any of the other forms we're proposing for slices.

It needs to be a path, e.g. /ipld/QmID/something. The whole "everything is a path" concept in ipfs, libp2p, multiaddrs, ipld, etc. is really important (paths are recursive and composable).

Search params are part of the path, in URL terms, and are part of the cachable key for a resource, so I don't necessarily agree that this form isn't "part of the path" but looking at it now I agree that this is the wrong form.

Since each path part is a nested property putting a search param on one part that is actually referring to sub-properties does seem "wrong."

An alternative that would not have this problem but is possibly too ugly to consider would be /ipld/QmID/something/?range=10,20. Technically, this is a different URL from the other form and is different in the browser's URL parser but I'm a little concerned that naive parsers might strip the trailing slash.

Looking back on this syntax, I'm actually a bit worried about it being miss-interpreted because it's a little too familiar to developers and means something else. I'm back to preferring something like /ipld/QmID/something/:range(10,20). The : character is reserved and, assuming we adopt URL constraints, should be escaped when used in a valid key.

Stebalien commented 5 years ago

Well understood and specified.

URL escaping is not well understood, it's complicated as hell (see https://golang.org/src/net/url/url.go). The rule isn't "forbid ; / ? : @ = &". Instead, there are different rules for the origin, path, query, etc. (some of which forbid things like $, ,, etc.).

If we were to create our own constraints and escaping we'd have to:

If we only need to escape /, this shouldn't be difficult at all. Really, we can probably just borrow from other systems and use \\ for \ and \/ for /. That gives us the added benefit of having a single encoding for every path (modulo the multibase CID).

I'm not saying we shouldn't consider %2f (backslash escaping has it's own issues), but simply adopting URL escaping won't solve our issues without introducing a bunch of new ones.

You can still use any character for a property name, even reserved characters, they just need to be escaped. This will be true for any additional characters we'd want to reserve for any of the other forms we're proposing for slices.

They need to be escaped in paths. Escaping field names in data structures is not an option at this point.

The tricky part here is that we'd need to take a path of the form /ipfs/Qm.../asdf?asdf=x (we have files like this) and then make sure to carefully escape it when using it as an IPLD path.

prataprc commented 4 years ago

I might choose to think array indexing as key-lookup where its index is a hash-able key, that is, array is always to be interpreted as associative array Otherwise array type will bring in the notion of item-position, which is one more case of ambiguity. That is, when we say /ipld/QmID/10 we don't know whether 10 is to be interpreted as numeric index or associative-key.