tidyverts / tsibble

Tidy Temporal Data Frames and Tools
https://tsibble.tidyverts.org
GNU General Public License v3.0
527 stars 49 forks source link

dplyr no longer using dplyr°_error class #270

Closed romainfrancois closed 2 years ago

romainfrancois commented 2 years ago

We're in the process of releasing dplyr 1.0.8, which will stop adding the dplyr_error class to errors. Please consider this pull request.

romainfrancois commented 2 years ago

@lionel- is there some alternative way to test for "this error originated from dplyr" ?

lionel- commented 2 years ago

I don't think so. For which errors did we remove this class?

romainfrancois commented 2 years ago

All of them. We used to mark errors with class = "dplyr_error" but then we decided not to.

Can't we use cnd$trace$namespace to figure out that abort() was called from within dplyr ? That may be risky I guess.

library(dplyr, warn.conflicts = FALSE)

withCallingHandlers(summarise(mtcars, stop("boom")), error = function(cnd) {
  print(cnd$trace$namespace)
})
#>  [1] "base"  "dplyr" "dplyr" "dplyr" "base"  "dplyr" "base"  "dplyr" NA     
#> [10] "base"
#> Error: Problem while computing `..1 = stop("boom")`.

Created on 2021-11-17 by the reprex package (v2.0.1.9000)

lionel- commented 2 years ago

Can't we use cnd$trace$namespace to figure out that abort() was called from within dplyr ? That may be risky I guess.

Yeah, that's not something we'd want to do or recommend. But in general there should be no need for testing that an error originates from a particular package.

For instance in this particular case I don't understand what's being tested because the expected error class is so general.

romainfrancois commented 2 years ago

I suppose the idea of expect_error(count_gaps(tsbl, .name = NULL), class = "dplyr_error") was testing that it's not an error from tsibble but from dplyr.

There are a few packages that do expect_error(class = "dplyr_error")

earowang commented 2 years ago

Yea, it's intended to test if the error is from {dplyr}.

When will the new version of {dplyr} be pushed to CRAN?

lionel- commented 2 years ago

I can see what the test is doing but I don't understand the intent of the test. What dplyr error is expected? Maybe this should be turned into a snapshot test instead?

In general, I'm conservative about adding error classes because they instantaneously become part of the public interface and may be depended upon, as we see here. (And for a while in the past testthat encouraged users to add class = arguments for classed errors, which worsened the issue.) For the same reason, we generally shouldn't remove classes. How many packages are depending on "dplyr_error"?

romainfrancois commented 2 years ago

How many packages are depending on "dplyr_error"?

Just a few. I agree that moving to snapshot tests will be more useful. We could add them back, but given that's just a few, I believe package maintainers can be convinced.

Also, I'm not even sure that all the errors from dplyr used to have the class. Would it make some sense to have abort() keep top_env(caller_env()) somewhere ? But you seem to rule this out here:

But in general there should be no need for testing that an error originates from a particular package.

romainfrancois commented 2 years ago

@DavisVaughan might have views on this since another package using the class is yardstick: https://github.com/tidymodels/yardstick/pull/239

lionel- commented 2 years ago

Just a few. I agree that moving to snapshot tests will be more useful. We could add them back, but given that's just a few, I believe package maintainers can be convinced.

Sounds good.

Would it make some sense to have abort() keep top_env(caller_env()) somewhere ?

There used to be an rlang issue about keeping track of the package in error conditions but we never went through with it. This is probably not needed and this sort of public properties creates unexpected dependencies. For instance it might make it more difficult to move error generation to a backend package.

DavisVaughan commented 2 years ago

I think it was useful in yardstick to capture the text output of the error message, but I think testing for the error class explicitly was a mistake because yardstick doesn't own the "dplyr_error" class.

I'm going to just make mine into snapshot tests, and I think that would be the most robust thing to do here too

romainfrancois commented 2 years ago

ping @earowang. Any chance for a release with this fix ? We aim to release dplyr early next week.

earowang commented 2 years ago

Thx. Will submit the new version to CRAN soon.