Closed morungos closed 10 months ago
@morungos I don't think we should expend much (any) effort on this. subset
is used (almost) like this in all sorts of R functions - see lm
for example. However, in lm
, there is no default value (which will stop Simon setting one), and subset is picked up (when present) using match.call
. (Another alternative would be to use the missing
function, but this is frowned upon these days - the recommendation is to put a default value and test for something other than the default, which is what I did). I can easily implement the match.call
option, with no default, if that is better practice than using a default of NULL
The subset command is really, really useful. It is designed for interactive use (so putting it at the top of a script is a strange thing to do.) When I have run the OSPAR assessment, I have 30000 time series. I don't want to produce graphs for all of them (same with the reports). But I do often want to look at specific series, or groups of series, and this is easily done using subset.
For relatively small assessments, you might as well produce all the plots and all the reports, so the subset command won't ever be needed.
Happy to discuss
So for plots, we just remove that argument?
Let's think, because I did get it to work pretty well here, but if everywhere else does subsetting in a different way (personally, I think passing an expression, not a string, is probably the right way) we should not be incompatible.
I noted that @swamap used this with string values, not expressions. At the least, we should probably make sure that subset value types are clearly documented. I'll check that when I am next at my desk.
We need to keep the subset argument in plot_assessment, report_assessment, run_assessment and update_assessment. Personally I would like to keep it as it is. The subset argument is pretty well documented in run_assessment. It can be a logical, or an expression that will be evaluated to a logical within the timeSeries data frame.
That's almost fine, with a couple of issues:
NULL
, which isn't either a logical or an expression.I'd rather not do nothing here, because @swamap's usage is reasonable, and the current code cannot handle setting a variable to the default value and passing the value through a variable, because, essentially, it is using an unevaluated value rather than an evaluated one -- even if you use expressions right.
The fact that:
x <- NULL
plot_expression(..., subset = x, ...)
Works entirely differently to:
plot_expression(..., subset = NULL, ...)
Is not at all intuitive. I'm honestly surprised it's even possible to write code that works that way.
Good points. Ok, I'll remove the NULL default and get it to work with match.call. Will do that after the reports!
I've just been reading Hadley Wickham's Advanced Programming in R (and various other blogs). All are recommending the use of NULL as a default argument (rather than missing or NA). So it seems to be current 'best' practice, even though it isn't very good practice. Can we have a chat about the best way to go? I've used NULL as a default all over the place!
For the record, you can do what Simon wants to do as:
assign("x", NULL)
plot_expression(..., subset = x, ...)
but not intuitive!
Actually, I'm okay with NULL
as a default, but passing an explicit NULL
should, I think, probably function the same, and it doesn't right now. Today, passing NULL
via a variable generates no plots (or even an error), passing it explicitly selects all of them. But, worse, I can't figure any way to actually make subsetting work right. Any use of a variable here seems to break.
And I tried the assign()
version just now, and it errored on me (admittedly in my tests) -- exactly the same as the normal assignment. All of this is pretty easy to test and replicate with my draft pull request setup.
Error (test-config.R:97:5): package works with configurations
Error in `xj[i]`: invalid subscript type 'expression'
Backtrace:
▆
1. └─harsat::plot_assessment(...) at test-config.R:97:5
2. ├─timeSeries[ok, ] at HARSAT/R/graphics_functions.R:114:5
3. └─base::`[.data.frame`(timeSeries, ok, ) at HARSAT/R/graphics_functions.R:114:5
I think, really, what my concern is: what is the "right" R way to allow a variable to be set with some expression that can be injected. I thought the expression()
assignment was reasonable. But I admit I didn't compare to direct use, and I probably should have. I'm trying to focus on what the setups and scripts should be, and not about how it will work inside our functions. Once the API is clear in tests, I can just get on with making it work.
Done a little testing, and there are several cases we need to handle, including the subtle ones.
plot_expression(..., subset = NULL, ...)
NULL
/value (logicals allowed too)assign("x", NULL)
plot_expression(..., subset = x, ...)
plot_expression(..., subset = station_code %in% "A100", ...)
assign("x", expression(station_code %in% "A100"))
plot_expression(..., subset = x, ...)
Superficially, cases 2 and 4 look the same at the parse tree level. That is why things break. My sense is that all four are reasonable and all four should work correctly.
If I can make code that can do all four transparently, at least for this plot case, would that suffice?
Addendum: IMO, the following should not be supported. i.e., we should not let people throw in a string.
assign("x", 'station_code %in% "A100"')
plot_expression(..., subset = x, ...)
assign("x", 'NULL')
plot_expression(..., subset = x, ...)
May be a silly question ... but would this crash-out if ‘x’ is reassigned somewhere between its original definition and it being called in the plot function ... e.g. if x is used as a counter variable somewhere?
Cheers
Simon
From: RobFryer @.> Sent: Thursday, December 7, 2023 3:46 PM To: osparcomm/HARSAT @.> Cc: swamap @.>; Mention @.> Subject: Re: [osparcomm/HARSAT] Fix issues with NULL and subsetting in plots (Issue #386)
For the record, you can do what Simon wants to do as:
assign("x", NULL) plot_expression(..., subset = x, ...)
but not intuitive!
— Reply to this email directly, view it on GitHub https://github.com/osparcomm/HARSAT/issues/386#issuecomment-1845583165 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AR5CXR6MUQCBPVJEUADL2ITYIHQC7AVCNFSM6AAAAABAKIBQNCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBVGU4DGMJWGU . You are receiving this because you were mentioned.Message ID: @.***>
May be a silly question ... but would this crash-out if ‘x’ is reassigned somewhere between its original definition and it being called in the plot function ... e.g. if x is used as a counter variable somewhere? Cheers Simon
@swamap Well, if the variable's value is changed into something nonsensical like zero or pi, possibly. But that's a user error. x
here is purely to save me typing, any variable name is equivalent. Reassigning values from original definition is fine, and actually reasonable. For example, you might actually want to put the plot generation inside a loop, with a variable changing the subsetting each time.
@swamap I was jumping the gun; @morungos is absolutely right, you cannot use the assign to get it to work
The only way you can pass a variable at present is if that variable is a logical (of length 1, or with the same length as the number of rows of assessment_object$timeSeries).
Resolved in #385
This was raised by @swamap. The problem manifested as being unable to use variables to configure a run, e.g.:
The issue is that the code uses
substitute()
, as:So, if
My_plot_subset_choice
was assignedNULL
,substitute(subset)
returnedMy_plot_subset_choice
, which is notNULL
, so we go ahead with the condition anyways. I'm very twitchy about using string expressions anyway. I've drafted up a pull request to explore this, see: #385, which uses expressions better, and does appear to handle all the right cases.Note that
substitute()
is used in quite a few other places too, so the same issue may manifest in other ways.Also note that we do want @swamap to be able to use variables to configure a run, as that will ease all sorts of other interfaces in the future. For that reason, I'm inclined to consider this a bug, and labelled it accordingly.