Closed dylanbeaudette closed 3 years ago
I think it is fine because they are properly set up as generics for S4 classes.
ok
While it is "fine" -- I do agree it is possibly unwise -- it is a little hairy given that we have defined additional arguments that are not SoilProfileCollections that slightly differ from the base function definitions... I did look into this somewhat when I had to sort out the S4 documentation for these things.
Both split and union have several possible conflicts in base/other packages -- I am not sure whether you were getting at "can we" or "should we" -- the latter is more philosophical and relates to the naming issue #140
I'm more concerned with should.
... which should be discussed in the other issue probably. Can you elaborate? I don't see an inherent issue other than that it generates a message on load.
Possibly relevant: https://stackoverflow.com/questions/28321655/r-creating-a-generic-function-for-split-from-package-base-in-the-global-env
There are probably safer ways we can handle these masking -- and possibly ways to avoid the messages while retaining names -- if we are clever.
Also, I was wrong to be so definitive in my first response. As you said, we should think carefully about this.
filter
and split
are "fine" in that they are unambiguous generics for SoilProfileCollections.
union
is a problem because it takes a list as an argument. Not a SoilProfileCollection. So we are not dispatching it via S4. It has a very different method signature from the other union
and it will mask it in an ambiguous way. So, the user would have to use base::union
if aqp is loaded. This is not ideal at all.
r.bind.SoilProfileCollection
kindof makes/made more sense (currently deprecated but functional). Usually we have to put things into a list anyway. I can see the value of having both abilities -- however when we cant be sure we are passing a SPC we probably shouldn't mask any names. The functionality of union
(and split
) is sufficiently different from both of the base
methods. Indeed, aqp::filter
is quite a bit different from stats::filter
and is more like base::subset
IMO. These are all good fodder for discussion of renaming to something that will not mask, or will have behavior similar to base
but for SPCs.
See the changes I needed to make for documentation / dispatch of plot
-- relevant to my above comments about changing the default arguments that are used in a generic. It is one thing to add arguments, but it is another to remove them. For plot, we need to define a y
argument ... because that is what the generic contains.
We can put the documentation to the main Rd file where plotSPC
is defined. Now ?plot
shows the {aqp} method, if loaded.
https://github.com/ncss-tech/aqp/commit/0b5ee26597df93d4bf0ba0313d8fe37c2b40764f
Thanks for working on this. Chalk this up to another major set of decisions: generic (possible overloaded) function names vs. something like unionSPC
. Conflicts with other packages aren't as much of an issue, but conflicts with base R seems like a recipe for confusion.
I recently added trunc,SoilProfileCollection-method
as a wrapper method for glomApply(spc, function(p) return(c(constant_top, constant_bottom)), truncate=TRUE)
. It produces no messages about masking -- the generic is left unaltered.
Since the original base::trunc
includes ...
in the generic method signature, I can add whatever arguments I want to the S4 version of the method (for SPCs). The same cannot be said for our current usage of filter
, union
and split
. I think w.e should consider changing these names entirely. I am inclined to fix filter
since it has never been on CRAN. base::subset
has the correct functional connotation and method signature (x, ...)
Thanks for trunc
, I can think of several use cases.
As for re-naming, I don't mind using the SPC suffix for most of these short function names: plotSPC
, unionSPC
, filterSPC
, splitSPC
. Extending subset
is another option, your call as you wrote the back-end code. Once we decide, I'd like to deprecate subsetProfiles
as there isn't really any reason to have that much overlapping functionality.
The base::filter
is for linear filtering on time series. The reason to use filter
in an aqp function name at all is b/c "a lot" of R users "think of" it as the verb that does subsetting using logical expressions. In base, the verb is subset
. "Under the hood" filter and the other methods that use NSE use rlang
.... but that is no reason to use dplyr
verbs per se. We can have filterSPC
as an alias, rather than adding filter
for the first time on CRAN and deprecating it immediately.
Names in base like min
max
and unique
are major conveniences for SPCs -- and they essentially do what their verb name says as best they can for SPCs. They can be implemented via S4 without changing or affecting default generic -- and don't issue messages on load -- is that the bar we set for whether "SPC" gets added to a function or not? Seems like a decent practical limitation to use to guide naming of the fundamental generics that work with the SPC class.
Side note: I wish aqp::unique
was more easily usable with %>%
-- if it returned an SPC object and not a numeric vector that would be possible. However, it returns a numeric index.
This is relevant to the discussion because we can add an argument. The base::unique
generic includes ...
... however I think that the argument should be to use the current behavior and new default should be a SPC result. I failed to explain/convince you of this in #129. I am curious how much work it would be to fix that downstream (e.g. add an argument to return IDs, or refactor) the stuff for SoilWeb etc. where the MD5 hashes are used.
The
base::filter
is for linear filtering on time series. The reason to usefilter
in an aqp function name at all is b/c "a lot" of R users "think of" it as the verb that does subsetting using logical expressions. In base, the verb issubset
. "Under the hood" filter and the other methods that use NSE userlang
.... but that is no reason to usedplyr
verbs per se. We can havefilterSPC
as an alias, rather than addingfilter
for the first time on CRAN and deprecating it immediately.
I'm fine either way, even though I periodically (ha! get it?) use base::filter
. I've already started writing filter(x, ...)
in my code, but not enough time to set precedent. You call on this one. If we keep filter
, then adding filterSPC
is a fine backup plan in case we need to revert. Good idea.
Names in base like
min
max
andunique
are major conveniences for SPCs -- and they essentially do what their verb name says as best they can for SPCs. They can be implemented via S4 without changing or affecting default generic -- and don't issue messages on load -- is that the bar we set for whether "SPC" gets added to a function or not? Seems like a decent practical limitation to use to guide naming of the fundamental generics that work with the SPC class.
Good idea. This means that plot,SoilProfileCollection
will become plotSPC
. Filter is kind of a special case, but in general I'd like to avoid masking base functions when we cannot match the original argument pattern of generic functions.
Side note: I wish
aqp::unique
was more easily usable with%>%
-- if it returned an SPC object and not a numeric vector that would be possible. However, it returns a numeric index.
True, but a lot has changed since the original motivation / implementation of aqp::unique
.
This is relevant to the discussion because we can add an argument. The
base::unique
generic includes...
... however I think that the argument should be to use the current behavior and new default should be a SPC result. I failed to explain/convince you of this in #129. I am curious how much work it would be to fix that downstream (e.g. add an argument to return IDs, or refactor) the stuff for SoilWeb etc. where the MD5 hashes are used.
There was no failure, rather the discussion on unique
was put on the back-burner while we figured out aqp::split
and aqp::union
. At this point I am fine with breaking unique
in favor of returning an SPC. I've got a reasonable list of places where I use this function and I think that I can safely work around the changes. Moving that discussion to #159.
Good idea. This means that
plot,SoilProfileCollection
will becomeplotSPC
. Filter is kind of a special case, but in general I'd like to avoid masking base functions when we cannot match the original argument pattern of generic functions.
I think plot,SoilProfileCollection
is fine actually since I added the y argument to our usage of it. It does nothing presently. This is how raster
did it. And of course plotSPC
already exists.
I'm fine either way, even though I periodically (ha! get it?) use
base::filter
. I've already started writingfilter(x, ...)
in my code, but not enough time to set precedent. You call on this one. If we keepfilter
, then addingfilterSPC
is a fine backup plan in case we need to revert. Good idea.
I am suggesting that I define subset,SoilProfileCollection-method
and filterSPC
. Whether we leave filter
defined and deprecated for the first time we publish to CRAN is up to you -- but I was suggesting we not do that. Dump filter
altogether is what I am leaning towards. Or I could do nothing and allow it to warn about the re-definition of the base R filter, as dplyr does every time it loads.
This issue may not have even started if it weren't for the fact that we get messages every time we load the aqp package on its own. If we adhere to the rule I stated above where we do not ever re-define the generics of base methods, we will only see those messages when we have non-base packages with same names (e.g. dplyr). I can look into what I can do to avoid even those messages -- as they may be very similar to the plot,SoilProfileCollection
case above.
Good points, and I agree with your suggestions. There will always be some noise at package load time, esp. with the popularity of incantations like library(tidyverse)
.
I have not forgotten about union
. I think I have wavered somewhat in my resolve for tackling some of these... so rallying some last minute ideas.
The methods that can be dispatched via S4 for SoilProfileCollections are "safe" -- so split
is OK -- but union
still remains. Really, what we call aqp::union
does not behave sufficiently like base::union
to even call it something like unionSPC
.
In https://github.com/ncss-tech/aqp/tree/union I took a crack at resolving this and propose a new syntax.
I propose we:
aqp::union
other than the list. they are not needed. spatial result can revert to default for non-conformal case, and the na.rm
argument is rarely used and simply triggers an error anyway. The method
argument is interesting but not developed, and alluded to a variety of gaps in SoilProfileCollection methods. These could be addressed as punion
pintersects
psetdiff
psetequal
etc. analogous to the {data.table} set operations e.g. funion
. This could warrant a heavier use of {digest} -> aqp::unique
/ lunique
aqp::union
in next CRAN release, replacing it "officially" with pbindlist
(or some to-be-determined name)pbindlist
draws from data.table::rbindlist
the workhorse of the method -- we are "profile" binding rather than "row"rbind.SoilProfileCollection
, since there is no reason IMO to not support ...
expansion aside from potential for misuse (do.call
) but otherwise it has been deprecated since 2018 and it is time to goaqp::union
to match generic of base::union
aqp::union
still "works" except it now issues a deprecation message (and possible IDE warnings about missing y
default argument due to borrowing the base R generic)Tagging #168; moving some above commentary to draft PR
I am not sure how commonly folks run into masking issues with aqp, but was closing some browser tabs and thought I would add a link to this discussion.
Might be worth working up a brief aqp-specific tutorial describing ideas in this article as it pertains to common packages. In R 3.6.0+ we can use the conflicts.policy
options to have more control over effects of library
/require
calls.
https://developer.r-project.org/Blog/public/2019/03/19/managing-search-path-conflicts/index.html
These are practical base R solutions to the "inevitability" of masking -- and highlight the difference between masking that is intentional (e.g. dplyr replaces union, setdiff etc with methods that work generically in place of base equivalents) versus unexpected (two packages that were not 'designed' to co-exist -- e.g. aqp and dplyr).
It also highlights that by default the generics that are handled by S4 are not considered conflicts unless in strict mode. While there are packages that manage conflicts, as author says this behavior should be lightweight and part of base R. It would be great if, as he suggests, these things could be defined somehow in the package DESCRIPTION.
Thanks for offering ideas and several possible solutions.
- remove all of the arguments to
aqp::union
other than the list. they are not needed. spatial result can revert to default for non-conformal case, and thena.rm
argument is rarely used and simply triggers an error anyway. Themethod
argument is interesting but not developed, and alluded to a variety of gaps in SoilProfileCollection methods. These could be addressed aspunion
pintersects
psetdiff
psetequal
etc. analogous to the {data.table} set operations e.g.funion
. This could warrant a heavier use of {digest} ->aqp::unique
/lunique
I like the idea of removing all arguments to aqp::union
until we figure out what to call the replacement. I'd like to talk briefly about the path we want to take forward: p-****
to build off related set operations (that is appealing) vs. more connotative and possible conflicting words like combine
. I don't know which is more effective, but the set operation approach is growing on me.
- deprecate
aqp::union
in next CRAN release, replacing it "officially" withpbindlist
(or some to-be-determined name)
Yes please.
- the name
pbindlist
draws fromdata.table::rbindlist
the workhorse of the method -- we are "profile" binding rather than "row"
Yeah, I get that but the name isn't appealing to me. The logic is sound but I think that a more connotative name will help adoption.
- remove or un-deprecate
rbind.SoilProfileCollection
, since there is no reason IMO to not support...
expansion aside from potential for misuse (do.call
) but otherwise it has been deprecated since 2018 and it is time to go
I think that we should remove it.
- until removal change the default arguments of
aqp::union
to match generic ofbase::union
Yes, thanks.
aqp::union
still "works" except it now issues a deprecation message (and possible IDE warnings about missingy
default argument due to borrowing the base R generic)
Good idea so as not to break things just yet.
I want to add to this that the choice to mask stats::filter
has driven me nuts since I implemented it -- I don't like the noise on loading. And I also do not like that it is a kinda nuanced to use with {dplyr}. I would really like to encourage aqp::subset
which is "safe" and an appropriate {base} verb. \
Proposed changes in #203--coupled with removal of some of our older previously deprecated functions cough aqp::union
--should allow us to close this issue.
I'm thinking that it may be unwise to cause the following: