Do not match downgraded conditions in `try_fetch()`

lionel- commented 1 year ago

Follow-up for https://github.com/r-lib/rlang/issues/1534

Matching conditions across parents indistinctly caused a bunch of failures in ggplot2 revdeps because errors downgraded to warnings by a layer were incorrectly matched by the external try_fetch() that decorates errors with contextual information. Schematically, current dev rlang has this behaviour:

try_fetch(
  error = function(cnd) abort("TILT"),
  try_fetch(
    error = function(cnd) warn("Downgraded error", parent = cnd),
    abort("Parent error")
  )
)
#> Error in `handlers[[1L]]()` at rlang/R/cnd-handlers.R:212:6:
#> ! TILT

To avoid this, the PR changes try_fetch() so that it doesn't match any downgraded conditions (from error to warning or from warning to message). Upgraded conditions are allowed to match. With this change we recover the CRAN rlang behaviour:

#> Warning message:
#> Downgraded error
#> Caused by error:
#> ! Parent error

hadley commented 1 year ago

Given this unexpected consequence are we still sure that matching the expectation hierarchy is the right approach? I'm a little concerned were implementing this based on our gut feeling of it being useful, rather than a deep analysis of a fairly profound change to condition handling.

lionel- commented 1 year ago

I'm pretty sure this intuition goes in the right direction because not matching across parents makes condition handling of specific classes simply too brittle to changes in foreign packages.

This is a case where I'm skeptical of our ability to come up with a deep analysis in the abstract as opposed to getting practical experience with the approach. I haven't run revdeps of ggplot2 yet but if these turn clean I think we should proceed with the change to get experience with this new behaviour.

More generally I feel satisfied with the new behaviour implemented in the PR, though this is only intuition. I think it is less risky to go ahead with this change than it was to start wrapping conditions in dplyr and ggplot2 with chained errors.

lionel- commented 1 year ago

There are probably few folks apart from us using try_fetch() anyway.

Thanks, I think so too.

I have ran the revdeps of rlang, dplyr, tidyr, purrr, and ggplot2. No failures to be reported so I'll merge this now.

lionel- commented 1 year ago

@DavisVaughan @jennybc @gaborcsardi If you have time and are interested, could you please take a look at the approach taken here?

Context: rlang has added support for chained errors. A chained error is one that has another error (or really any condition) in its parent field. In this case, the error message is represented as a chain of causes (e.g. Error in ... Caused by error in...). To chain a contextual error to a causal error, you essentially catch the inner causal error, create a new contextual error, and set the causal error as parent. The end result is that you have an entirely new error object with its own class.

We've started chaining arbitrary errors and warnings in various tidyverse packages: dplyr, ggplot2, and purrr. And we've started receiving user reports about behaviour change caused by chaining errors. For instance if you have:

exec_three_times <- function(fn) {
  for (i in 1:3) fn()
}

With this downstream usage:

f <- function() abort("foo", class = "qux")

tryCatch(
  exec_three_times(f),
  qux = function(cnd) writeLines("gotcha!")
)

In this example the user generates a classed condition and catches it. There's an intermediary stack because their function is run through exec_three_times(), but conditions can traverse package boundaries so that's alright.

Now imagine the upstream maintainer changes exec_three_times() to:

exec_three_times <- function(fn) {
  purrr::walk(1:3, \(...) fn())
}

Since purrr chains all errors to add contextual iteration info, the user tryCatch() is no longer able to catch conditions of class "qux". I think this is a fundamental problem with chained errors that makes condition handling across package boundaries brittle.

To solve this, dev rlang changed try_fetch() so that it matches conditions across the parent ancestry. Then if the user changes their code to use try_fetch(), it becomes resilient to usage of chained errors in third parties:

f <- function() abort("foo", class = "qux")

try_fetch(
  exec_three_times(f),
  qux = function(cnd) writeLines("gotcha!")
)

However this changes caused breakages in revdeps of ggplot2 because of this situation:

Stat layers capture user errors (e.g. in smooth functions) and downgrade them to warnings. The original error is set as parent, so we have a warning chained to an error and the user gets both the warning message about a problem during layer computation and the original error message.
The ggplot builder catches errors with try_fetch() and chains them to a contextual error that provides more information (e.g. layer number/identity).

Changing try_fetch() to match across parents caused it to match these errors downgraded to warnings, since they were stored in the parent field. The downgraded warning was then chained to a contextual error and rethrown as error instead of warning.

To solve this, the PR only allows to match parents of equal or lesser severity. For instance, if an outer try_fetch(class = ) first gets a warning, it's not allowed to match "class" in parent errors. If the caught condition is a message, then no matches occur for parent warnings or errors.

To sum up the main ideas behind the new try_fetch() behaviour are:

We match classes across parents so that contextual errors are skipped. This way wrapping arbitrary errors with a contextual error does not cause third party code to fail.
However we never match parents of higher severity. In other words we don't match parents that were downgraded to a lesser severity, for instance errors downgraded to warnings.

As Hadley mentioned, experimenting with this approach in try_fetch() does not have high costs because it wasn't widely advertised yet. But to solve the problems of matching across ancestry, we might start to recommend it more generally to end users. I just wanted to get your feedback in case you see something obviously wrong in the mechanism.

gaborcsardi commented 1 year ago

Matching on parent errors, and then delivering the matched parent error in try_fetch() makes sense to me. I wish we didn't have to do that. It is also bad that tryCatch() and try_fetch() work differently now. :( But in the current situation, it seems like a good solution that will probably work as people expect it to work, in most cases. (It is still an API breakage, though!)

OTOH, not matching "downgraded" conditions seems weird to me. In the condition system "error" and "warning" are just labels, but with this PR we are now ordering them, which is probably not great. (Is "message"` also treated differently?) It also seems like this should not be a "global" decision, based on the message < warning < error ordering.

Personally, I would look for another solution. E.g. ggplot2 could set a flag in the warning object, that would prevent matching the parent condition. Whether upstream code should match the internal parent errors or not (in this case not) is up to ggplot2. What kind of errors / warnings ggplot2 throws is part of the ggplot2 API, so I think the decision about error matching for conditions originating from ggplot2 belongs to ggplot2, and it should not be a global decision.

Generally, the set of error classes that can be thrown from a function is part the function's API. So making a function throw a chained error instead of letting downstream errors bubble up is an API change. This is of course painfully obvious now.

lionel- commented 1 year ago

In the condition system "error" and "warning" are just labels, but with this PR we are now ordering them, which is probably not great.

It's true that in the base condition system these classes just represent different signalling protocols, at least by default. However these signalling protocols do have an implicit ordering in terms of severity so it doesn't seem too far fetched to match this ordering in the handling system.

There are situations in base R where the class ordering does not necessarily match the severity of the signalling protocol, which possibly makes the ordering unprincipled. For instance, it's possible to downgrade an error to warning by just resignalling it with warning() without changing the class or chaining it to an actual warning condition. But in this case, any tryCatch(error = ) will incorrectly match the downgraded error. To avoid this, we assume in rlang that proper condition handling requires matching the class to the signalling protocol. This is why rlang does not provide any way of signalling for instance errors with the warning protocol. cnd_signal() selects the protocol (stop(), warning(), or message()) based on the class alone and I think that's a good principle to follow with base R too (though there aren't any tools to help with that).

Personally, I would look for another solution. E.g. ggplot2 could set a flag in the warning object, that would prevent matching the parent condition.

I'd prefer not have to modify foreign conditions in this way, it'd be surprising for third party users to find new flags in their condition objects, and may cause unexpected behaviour or failures in some cases.

Also we can't really make try_fetch() match parent errors without making sure this doesn't break ggplot2. That's the one package we can't break on purpose without a lengthy cycle of resiliency measures because of how large the user base is.

Whether upstream code should match the internal parent errors or not (in this case not) is up to ggplot2.

I think there is a problem with making ggplot2 (and dplyr, and tidyr, and purrr, and all sites of error chaining) responsible for the behaviour of upstream condition handlers: it makes it a complicated design decision that all users will have to make over and over when rethrowing errors.

One important principle of this PR is to find a good default that makes package interop in the presence of conditions reasonably consistent and predictable without requiring users to think and make decisions about very subtle semantics.

It's worth noting that despite this default the system preserves full flexibility because you can write a handler for class "condition" and implement the matching yourself. try_fetch() has withCallingHandlers() behaviour so you can decline to handle a condition.

gaborcsardi commented 1 year ago

My point is, the condition system itself does not differentiate between errors, warnings, etc. What you call "signalling protocol" is not part of the condition system. All messages, warnings and errors are emitted with signalCondition(), essentially.

It is true that cnd_signal() already breaks this assumption, that does not seem right, either, btw. Anyway, this is not really important here, I guess.

I'd prefer not have to modify foreign conditions in this way,

You'd set it in the new condition, not in the foreign one. E.g. you'd set match_parents = FALSE to mean: do not look at the parent conditions.

Also we can't really make try_fetch() match parent errors without making sure this doesn't break ggplot2.

Yeah. You could first update ggplot2 to throw the right conditions, and then update rlang to handle these, but you'd need to tell people to update update ggplot2 after the new rlang release, which is not really great.

I think there is a problem with making ggplot2 (and dplyr, and tidyr, and purrr, and all sites of error chaining) responsible for the behaviour of upstream condition handlers: it makes it a complicated design decision that all users will have to make over and over when rethrowing errors.

Indeed. But that is the right place to make this decision, because the kinds of errors you throw from a function is part of that function's API. It is not actually that complicated, because the default usually works, and only in special cases would you want to add match_parents = FALSE.

lionel- commented 1 year ago

It is true that cnd_signal() already breaks this assumption, that does not seem right, either,

This was a conscious decision of ergonomic design based on the idea that separating the error class from the signalling protocol makes the system hard to work with.

E.g. you'd set match_parents = FALSE to mean: do not look at the parent conditions.

Ah got it, this makes sense. I see two issues:

We would probably want to do that every time we downgrade conditions with error chaining. So it should probably be the default behaviour somehow, not an opt in.
This would prevent warning handlers from legitimately matching across parents. So we get the same brittle package interop issue as in the purrr example above, but with warnings.

lionel- commented 1 year ago

it should probably be the default behaviour somehow, not an opt in.

We could do it from abort(), warn() etc: inspect the class of parent and set the flag if we detect downgrading.

However this would introduce a behaviour dependency between try_fetch() and abort(). So it seems to make more sense to detect the downgrading from try_fetch() instead.

The flag solution is more flexible but I'm not sure this flexibility is good here. It seems to make it harder to reason about the handling system. But maybe not a problem in practice.

gaborcsardi commented 1 year ago

This would prevent warning handlers from legitimately matching across parents. So we get the same brittle package interop issue as in the purrr example above, but with warnings.

I don't understand this. If we use the match_parents = FALSE marker, that works the same way for all conditions, warnings included.

All this said, if you are comfortable with this PR as merged, I think that's fine as well, I am kind of just an outsider talking here...

DavisVaughan commented 1 year ago

As someone not familiar with ggplot2 internals, this still feels like a code smell

Stat layers capture user errors (e.g. in smooth functions) and downgrade them to warnings. The original error is set as parent, so we have a warning chained to an error and the user gets both the warning message about a problem during layer computation and the original error message.

To me this kind of feels like off label usage of the parent field, so personally I'd look to see if ggplot2 could be changed to avoid this, and then we'd retain the feature of matching across any conditions, even downgraded ones

lionel- commented 1 year ago

@DavisVaughan This is an intended feature of rlang chained conditions.

How would you downgrade an error to a warning with proper message reporting for users? We currently don't have any tools for this except chaining conditions.

gaborcsardi commented 1 year ago

I think the root of the problem is that usually we rethrow errors (conditions) because we want to add more context to the original error. This is kind of an inheritance relationship between the new error (class) and the parent error (class), even though it is not implemented that way. In this case matching the parent conditions makes sense.

Whereas in the ggplot2 case, we just want to keep the original error, in case it is useful for debugging. This is kind of compositing, instead of inheritance. In this case matching the parent conditions does not make sense. Hence my suggestion to make it possible for ggplot2 to opt out of the parent matching.

DavisVaughan commented 1 year ago

@lionel- I was thinking ggplot2 could somehow store the error in a field other than parent that would still get chained but wouldn't be matched against. The word "parent" just feels somewhat wrong in that case, and I think @gaborcsardi's reasoning above about it being more like composition is why.

So some way for ggplot2 to opt out does sound good. But I guess there could be 2 choices of how to do it:

Storing it in a field other than parent lets you opt out on a per condition basis (or it could be stored in parent, but abort() could gain an argument to disable parent matching)
Making it an argument to try_fetch() lets you opt out on a per expression basis

lionel- commented 1 year ago

@gaborcsardi I like this framing but if we go in this direction we probably want a new field for the composition case to make the inherited and composition cases well separated. And then how do we handle the reporting? Via specific cnd-message methods or via rlang's base class method as we do for chained error? Do we still report "Caused by" as in the inherited case? I guess we can do it just like chained errors.

@DavisVaughan Your latest message just came in, and I think we are saying the same thing, which is a good sign?

DavisVaughan commented 1 year ago

@lionel- yea it looks like we are saying the same thing about having a new field

lionel- commented 1 year ago

Thanks for the discussion. I think this all makes sense but poses two issues:

try_fetch() will have to detect calls from ggplot2 and implement a compatibility path based on ggplot2's version and maintain this for a few years (cc @thomasp85). The breakage is too important to ignore (it turns warnings into errors which likely breaks 1000s of scripts).
We need a good name for that composition field, and either namespace it (causing an inconsistency with existing rlang field names) or risk breaking subclasses.

lionel- commented 1 year ago

Maybe it should be a boolean flag after all, as Gabor suggested, because we don't want to allow both ways of chaining at the same time.

How about parent = cnd, inherit = TRUE?

DavisVaughan commented 1 year ago

I think the important thing in my head is that the argument lives in abort() rather than try_fetch(), so yea that sounds fine to me

lionel- commented 1 year ago

I think this should solve the last issues:

Call the parameter .inherit and store it in cnd$rlang to avoid clashes with subclasses.
In warn() and inform(), make the default of .inherit detect downgraded conditions based on parent's class. Then ggplot2 doesn't need any changes and users don't need to worry about the inherit parameter except in very specific cases.

So by default the behaviour is essentially the same as in the current PR, and the severity ordering of signalling protocols is still recognised by rlang. But this can now be controlled in warn(), abort(), and inform(). It will also affect the behaviour of cnd_inherits() since this is no longer just a try_fetch() thing but a property of a contextual error.

Thanks for your helpful inputs @gaborcsardi and @DavisVaughan!

r-lib / rlang

Do not match downgraded conditions in `try_fetch()` #1573