ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

Simplify recursion limit to be optional integer #252

Closed creationix closed 3 years ago

creationix commented 4 years ago

The current spec is overly verbose and much more complicated than it need be. This change gets rid of three types (union, struct, and int alias) and also greatly simplifies the encoded form.

Before, we had :

After this change, it is:

rvagg commented 4 years ago

Previous discussion about this here: https://github.com/ipld/specs/pull/207#issuecomment-543406226 and lots below, especially @warpfork's multiple comments with extended thoughts.

Not saying this should be dismissed, just that this is a Chesterton's fence situation with context that needs to be considered. If we stick with the existing then this PR suggests that it's not properly documented so we need to do that.

creationix commented 4 years ago

Yes, I'm aware of the previous discussion and have also talked personally with @warpfork and @mikeal about it. I understand the desire for a signal that this thing might be expensive, but I agree with Mikeal that there are likely better places for this kind of concern that can have broader coverage an we should keep things simple here in the encoding of the selector.

mikeal commented 4 years ago

Given Tim’s new issues I think it’s worth revisiting this.

If I recall, my view previously was that requiring an explicit limit was mostly meaningless because a node will want to impose its own limit anyway. I didn’t have a strong opinion about how we presented the default but now that we’re seeing actual problems in the way we did it, we might as well just go in and make it better.

Looking into the future when there’s more codegen for IPLD Schema, this issue will crop up again so we might as well fix it while we have the flexibility to do so.

Also want to ping @hannahhoward since she dealt with this last time around.

warpfork commented 4 years ago

I'm still opposed to this because I think removing the explicitness will be likely to create an error-prone situation for implementors[‡], leaves a significant choice unprompted to the writers of selectors, and will end up not being a meaningful simplification in practice because nodes accepting arbitrarily selectors without recursion limits is an incredibly direct DoS vector and resultingly any ~hardened~ system that has encountered reality will require limits to be used anyway. In short, all the same concerns I held last time, I still hold, even after time and reflection.

[‡] -> In general, I think if you've got an enum (or union-aka-sumtype), then it results in clearer/less-branchy code and clearer docs and specs if you add another member to an enum than making the enum optional. They're cardinologically identical, of course, but one of them gives you much better vocab for the situations.

But... this does keep prompting questions. I'm willing to be outvoted on this.

ribasushi commented 4 years ago

TLDR: this is a bad idea, the depth limit should remain "required".

So I actually spent a good several hours triaging a recurring issue where some of our production nodes go into a tailspin repeatedly. The culprit ended up being an api call with unbounded recursion: https://github.com/protocol/bifrost-infra/issues/630#issuecomment-603296827

The core reason this is a thing is the optionality of the limit in go-ipfs. The server is prepared to keep sending gigabytes of data, and the client ( having no API surface to ask them for a limit ) is setting themselves to consume gigabytes of data.

If the depth-limit remained required, both sides ( server and client implementors ) would have to stop and think before putting in some number. This number might very well be 2^63-1, but it would be supplied consciously, instead of an unwitting default.

:-1: to making recursion limit optional ( or more precisely allowing it to be anything but 1 ~ very-large-int )

warpfork commented 4 years ago

I don't think that infra issue @ribasushi linked to is public, but the gist of it is this command:

SYNOPSIS
  ipfs refs [--format=<format>] [--edges | -e] [--unique | -u] [--recursive | -r] [--max-depth=<max-depth>] [--] <ipfs-path>...
ARGUMENTS
  <ipfs-path>... - Path to the object(s) to list refs from.
OPTIONS
  --format         string - Emit edges with given format. Available tokens: <src> <dst> <linkname>. Default: <dst>.
  -e, --edges      bool   - Emit edge format: `<from> -> <to>`.
  -u, --unique     bool   - Omit duplicate refs from output.
  -r, --recursive  bool   - Recursively list links of child nodes.
  --max-depth      int    - Only for recursive refs, limits fetch and listing to the given depth. Default: -1.
DESCRIPTION
  Lists the hashes of all the links an IPFS or IPNS object(s) contains,
  [...]

Notice how --max-depth is optional, and defaults to unlimited.

And it had exactly the effect we've speculated: in the wild, people are using this API, and specifying -r, and not specifying --max-depth because they don't have to.

And it created a problem.

(Furthermore: numerically, in this case: if the user had specified even a huge number like --max-depth 10000, it still would've arrested the problematic query. So, this is also anti-evidence for any claims that forcing users to go through the hoop of specifying an arbitrary large number will have no practical effect.)

creationix commented 4 years ago

I'm not saying we should allow unbounded recursion by default. I'm saying this isn't the best place to make this explicit. There should be a required parameter when running selectors that limits the overall recursion allowed by the selector globally. Inline inside the selector we're in domain land and should optimize for describing the intent in the more clear method. Concerns about DOS attacks and unexpected bugs seems like something that should be properly handled at a more global level when performing the selector, not when defining the selector.

rvagg commented 4 years ago

I don't think it's as clear as saying that in this layer it's not important and we should push that to an upper layer. All of our layers are API to some extend and should follow good API design practices, including clear signalling of intent and sensible defaults.

Not a strongly held objection this, but my leaning is toward being explicit. I wasn't a fan of the complexity of the nodes though: { none: {} }, I think I objected to at least one {} in there at the time.

warpfork commented 4 years ago

I think both these last two comments make excellent observations.

It's definitely reasonable to question where's the best place to make things explicit.

And it's also extremely reasonable to question whether these particular limits are the most useful ones. I'm sure they're not. (If one has more than one recursion, this is fairly clearly not a defense to the Billion laughs attack, for example.) Part of the reason we have a union here for the limit type is in admission of this, and assumption we'll see the design of cleverer limits in the future.

At the end of the day I land pretty close to @rvagg 's perspective. We expose a lot of different levels of APIs. We really don't know which ones users of our systems and specs will choose to forward or treat as internal. Defense in depth seems wise.

My opinion is we should be both explicit about limits in the selectors and explicit about limits when they're being evaluated if the selector is untrusted.

In the case of trusted selectors (say it's in a configuration file in a service you own and host locally), the selector document is it. Requiring a second out-of-band configuration of limits would be worse: more work for no gain.

In the case of untrusted selectors, we need to specify limits separately. But: we may still have gains here: if the selector document is explicit about its own limits, we can quickly and statically verify if we like those limits. This would let APIs return errors about over-ambitious selectors instantly and without waste of effort, rather than requiring the selector to be evaluated until the moment it hits a hard limit. (It also might not absolve the need for a hard limit (e.g. a separate limit, based on simple global counting, to fully block the aforementioned billion-laughs) -- but if it turns 99% of non-active-malice requests into an instant error with helpful feedback rather than a very belated and costly-to-reach error with no helpful feedback? I'll take that.)

In a hybrid case, where selectors are untrusted, but can be submitted to some sort of review process, having the limits embedded in the selector document again helps. An entire selector could be whitelisted after review (perhaps by CID, for example). This isn't possible at all if the limits aren't in document; and is a process that's made much more error prone by reduction of explicitness.

Not all of this, of course, is yet written. (In fact, we don't have any interpretation-side limits specified nor implemented at all. Applications of selectors so far pretty much land on that "hybrid case".) As such, some of this is speculative. But overall, it seems there are several possible upsides to retaining explicitness. And as loosening explicitness requirements is loosing a genie not easily rebottled... I'd rather not.

vmx commented 4 years ago

I am also in the "make it explicit camp" (perhaps it also comes from me liking Python a lot).

if the selector document is explicit about its own limits, we can quickly and statically verify if we like those limits.

This sentence stands out for me, making a good point about "why have a limit it two different places".

mikeal commented 3 years ago

Closing for future fix that relies on existing Union behavior.