r-lib / rprojroot

Finding files in project subdirectories
https://rprojroot.r-lib.org/
Other
149 stars 23 forks source link

Improve `find_root()` #83

Closed salim-b closed 9 months ago

salim-b commented 2 years ago

This PR

krlmlr commented 2 years ago

Thanks. To avoid mismatch between the other has_() functions, should we add a new optional argument to find_root() instead?

salim-b commented 2 years ago

@krlmlr I've updated my PR to modify find_root() instead. I had a hard time running the tests (I dont' really understand the failing ones). Hope it's fine.

krlmlr commented 11 months ago

Checks are failing here. Can you please take a look?

krlmlr commented 11 months ago

Oh, you said earlier it's difficult. I handle that.

aviator-app[bot] commented 11 months ago

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes. Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was closed without merging. If you still want to merge this PR, re-open it.


See the real-time status of this PR on the Aviator webapp.

Use the Aviator Chrome Extension to see the status of your PR within GitHub.
krlmlr commented 11 months ago

See https://github.com/r-lib/rprojroot/issues/96 for advice on running the tests.

salim-b commented 11 months ago

See #96 for advice on running the tests.

Thanks for the hint.

I've overhauled the PR, should be good to go now.

krlmlr commented 11 months ago

Thanks for your work on this. I now realize that my request to add a new argument to find_root() violates type stability principles: https://design.tidyverse.org/out-type-stability.html . I'm sorry for the extra work that went into it because of my misjudgment, but I think we're getting somewhere.

How do you like can_find_root() ?

I prefer small focused PRs. Happy to review a separate PR for the last two bullet points first. I can also work with your commits and create suitable PRs.

salim-b commented 11 months ago

I now realize that my request to add a new argument to find_root() violates type stability principles.

Lol. πŸ˜„ That was my initial approach with has_root()... Anyways, we probably want to factor out the common code of find_root() and the new binary function.

How do you like can_find_root()?

Seems a bit "bumpy language" to me... but has the upside of "referring" to find_root().

Some half-baked alternative suggestions:

In the end, I'd prefer the initially suggested has_root(). It's intuitively understandable and short. The confusion potential with the existing has_*() root criterions seems minimal to me. But it's your choice, of course. ☺️

I prefer small focused PRs. Happy to review a separate PR for the last two bullet points first. I can also work with your commits and create suitable PRs.

Sure, I'll submit them. See #97 and #98.

krlmlr commented 11 months ago

Yes. The initial proposal was nice. Again, sorry. I'm really having a hard time coming to terms with the has_ or is_ prefix being used for two different purposes.

root_find() and root_has() perhaps? That's a lot of code and docs to refactor, though.

Did we discuss the motivation for this feature? What does the caller do if there's no root?

I wonder if returning NULL violates type stability.

salim-b commented 11 months ago

root_find() and root_has() perhaps? That's a lot of code and docs to refactor, though.

Hm, that is more appealing indeed. Especially if more "root"-specific functions should be added in the future!

Regarding refactoring: I think I could do this in another PR if you'd like. I would make find_root() an alias for root_find() to maintain backwards compatibility, I guess. Maybe with a deprecation warning of the former?

Did we discuss the motivation for this feature?

No, I don't think so. My motivation is that I'd like to have a "native" way to logically check whether root criterion(s) are fulfilled. I.e. instead of

!inherits(try(rprojroot::find_root(my_criterions),
              silent = TRUE),
          what = "try-error")

What does the caller do if there's no root?

I'm not sure if I understand your question. In the first place, the caller is able to learn whether a root exists or not without the risk of throwing an error (or having to defuse find_root() in a way similar to the above example, i.e. has an easy means that returns them a logical scalar. Then they can do whatever they want with that information.

I wonder if returning NULL violates type stability.

I think the proper way to maintain type stability is returning empty vectors of the proper type instead of NULL, i.e. character() for a function that is supposed to return a character vector. You want to stop throwing errors in find_root() and instead return NULL if no root is found?

krlmlr commented 11 months ago

Thank you for your patience with this.

Can we please discuss the motivation first? This may influence the implementation.

After the caller calls root_has() and sees FALSE, what do they do? Fail, use a default, use another codepath, ...?

After the caller calls root_has() and sees TRUE, what other options do they have in addition to calling root_find() and using that?

Do you have a practical use case in mind? Or is it just to get a clean interface that doesn't use conditions in the case of no root?

salim-b commented 11 months ago

Can we please discuss the motivation first? This may influence the implementation.

Sure. I consider rprojroot's root criterions a suitable central place to define how a certain project type (R package, R package with testthat set up, pkgdown site etc. pp.) is to be detected, i.e. to collect and "standardize" this knowledge. If such a root criterion is updated or corrected/improved (like e.g. in #88), all the downstream rprojroot users benefit from it.

The information whether a path is part of a certain project type can be leveraged in various ways. I don't think we can and should try to anticipate them in such detail as your questions imply.

But to give an example: pal::is_pkgdown_dir() is a simple wrapper around the rprojroot::is_pkgdown_dir root criterion that could be deprecated in favor of rprojroot::root_has(rprojroot::is_pkgdown_dir). pal::is_pkgdown_dir() is e.g. used in pkgpurl::purl_rmd() to check whether it is sensible to generate a pkgdown reference index. pal::is_pkg_dir() analogously is a general-purpose building block in the same spirit (which could be deprecated in favor of rprojroot::root_has(rprojroot::is_r_package).

Does this answer your questions?

krlmlr commented 9 months ago

Thanks again for your patience.

Objects of class root_criterion are documented in a much better way than other examples of code I wrote: https://rprojroot.r-lib.org/reference/root_criterion.html#value-1 . In particular, the testfun component, which pal::is_pkgdown_dir() is already using, is a perfectly valid way to check if a criterion is fulfilled.

I understood your proposal in the following way: walk from the current directory to the root, return TRUE if any of the directories fulfills the criterion. This seems different from what pal::is_pkgdown_dir() is doing. What am I missing?

To move forward, I'll extract the test changes that are unrelated to the functionality into a separate PR.

Is there a way to resolve #84 independently of this discussion?

krlmlr commented 9 months ago

The test changes were already merged independently.

The least invasive and easiest way would be to return a classed condition, perhaps of class c("rprojroot_not_found", "error"). Users could tryCatch() and react to this specific condition. What do you think?

The other useful extension I can think of is to add an accessor function crit_testfun() or perhaps crit_call_testfun() that makes it easier to iterate over the list of test functions and call them for a specific path. I believe this would better solve the original pal::is_pkgdown_dir() use case.

Let's discuss #84 separately.

Sorry it took me so long to arrive at a conclusion here.

salim-b commented 9 months ago

I understood your proposal in the following way: walk from the current directory to the root, return TRUE if any of the directories fulfills the criterion. This seems different from what pal::is_pkgdown_dir() is doing. What am I missing?

pal::is_pkgdown_dir() currently does this:

rprojroot::is_pkgdown_project$testfun |>
    purrr::map_lgl(\(x) x(path = path)) |>
    any()

I'd like to introduce an additional check_parent_dirs arg, so it would basically become:

if (check_parent_dirs) {

  result <- rprojroot::root_has(criterion = rprojroot::is_pkgdown_dir,
                                path = path)
} else {

  result <-
    rprojroot::is_pkgdown_dir$testfun %>%
    purrr::map_lgl(~ .x(path = path)) %>%
    any()
}

result

Is there a way to resolve https://github.com/r-lib/rprojroot/issues/84 independently of this discussion?

Yes, the necessary changes to find_root()'s internal logic are included in this PR and can certainly be extracted. I'll submit a new PR once we decided/agreed on the way forward. See https://github.com/r-lib/rprojroot/pull/100 πŸ™‚

The least invasive and easiest way would be to return a classed condition, perhaps of class c("rprojroot_not_found", "error"). Users could tryCatch() and react to this specific condition. What do you think?

You mean to change find_root() to return that classed condition? I'd prefer a separate function root_has() as you suggested above, which would never fail but always return a boolean. Of course, we could still change find_root() (or root_find() if the function is to be renamed) to return that classed condition. My main concern is just to have a dedicated, straightforward way to test (TRUE/FALSE) for any root criterions (i.e. without the need to catch potential errors).

The other useful extension I can think of is to add an accessor function crit_testfun() or perhaps crit_call_testfun() that makes it easier to iterate over the list of test functions and call them for a specific path. I believe this would better solve the original pal::is_pkgdown_dir() use case.

If I understand correctly, this would be +/- the same as introducing a dedicated root_has() function, just with a different name, no?

krlmlr commented 9 months ago
tryCatch({ find_root(); TRUE }, rprojroot_not_found = function(e) FALSE)

looks straightforward to me. If we agree on the classed condition, the question becomes: do we add an exported function in rprojroot for this one-liner? The answer to that question might be "no", or perhaps "yes" if we manage to find a good name and interface logic. Currently, we're in a place like "We want this new functionality, but we need a new function because of type stability, and naming is hard." The classed condition would get us to "We have the new functionality, the next step is to wrap it up nicely, but everybody can do the wrapping in their code today if needed."

The new crit_call_testfun() would call all $testfun members just for one path, and not walk the parent directories if not found.

salim-b commented 9 months ago

Ok, I see. I guess my aversion against error handling was unsubstantiated. Looks like an elegant way forward πŸ‘

The new crit_call_testfun() would call all $testfun members just for one path, and not walk the parent directories if not found.

Got it. This seems like a sensible building block.

How do we proceed? Do you want me to submit a PR adding classed conditions to find_root()?

krlmlr commented 9 months ago

Thanks. Yes, a classed condition would be nice. Do we want crit_call_testfun() or crit_testfun() ?

salim-b commented 9 months ago

Do we want crit_call_testfun() or crit_testfun() ?

The crit abbreviation hasn't yet been used in naming objects in rprojroot, it appears. So to stay consistent, I'd instead suggest test_criterion(). "Test" ist commonly used for binary checks, I think (besides unit testing).

krlmlr commented 9 months ago

check_criterion() ? We can always rename later if the name is sufficiently unique.