Open billdenney opened 6 years ago
As I was fuzzing the interface after the above error, I found that this error could also be clarified:
set_units(NA_real_, NA, mode="standard")
Error in if (any(unclass(value) != 1)) warning(paste("numeric value", :
missing value where TRUE/FALSE needed
IMO, a missing value should be valid; a missing unit... has no meaning. So I would like to get an error, but definitively a more informative one. NA
s and NULL
s are tricky when it comes to if
statements. Thanks for spotting it!
(sorry, committed this in a branch; will soon be merged)
As I'm working with more real data, I'm finding that this is a common issue for me. I will relatively often have a vector where there are instances of combined NA as both the magnitude and the units.
What do you think about defining a new constant, NA_units_
(similar to what is done in lubridate for dates).
In effect, NA_units_
would allow for NA
as the value for the unit part while normal units would not allow that. It would require that the magnitude part is NA
. All mathematical operations (+, -, *, /, and the % relatives) with NA_units_
would return NA_units_
.
Could you provide an example? Are there any instances of a non-NA number with a NA unit? Because, otherwise, we could simply support NA in Ops.units
by returning NA if any operand is NA.
What I'm working with right now is clinical laboratory data in a clinical study. When a test isn't done for some reason (for example if the patient didn't come in for a scheduled visit), then the value and units for the test are explicitly missing (like d_missing
below).
Immediately, I'm just trying to setup the data, and it looks like the following:
d_no_missing <-
data.frame(Patient_ID=1:3,
Lab_Name=c("Glucose", "Cholesterol", "Cholesterol"),
Lab_Value=c(5, 100, 200),
Lab_Units=c("mmol/L", "mg/dL", "mg/dL"),
stringsAsFactors=FALSE)
d_missing <- d_no_missing
d_missing[2, c("Lab_Value", "Lab_Units")] <- NA
d_no_missing$Value_with_Units <-
mixed_units(d_no_missing$Lab_Value,
d_no_missing$Lab_Units,
mode="standard")
d_missing$Value_with_Units <-
mixed_units(d_missing$Lab_Value,
d_missing$Lab_Units,
mode="standard")
I see. Then one path would be to enable missing units. It requires some rework of the functions that manage the units attribute, but everything else would work seamlessly. Another path would be to return plain NAs and make units work with them. Let's see what @edzer thinks.
Great thought. I see the convenience for your use case, but I'm not convinced we should require for NA_units_
units that the magnitude is only allowed to be NA
. I see numeric
values in units computations as actual values with implicitly missing units. If we allow for NA_units_
,
> set_units(1, km/h) * 2
2 [km/h]
should actually return 2 [NA]
.
I'm not convinced we should require for
NA_units_
units that the magnitude is only allowed to beNA
.
Me neither. If NA_units_
is allowed, we should go for the whole pack, and allow this to be just another (missing) unit. But
set_units(1, km/h) * 2
should actually return2 [NA]
.
I'm not convinced about this. It is quite a change. Where is the boundary between unitless and missing units? It is not clear to me. I wouldn't change the behaviour based on a fuzzy boundary just to support a corner case. Moreover, unitless is a units concept, but missing is an R concept; so unitless is more general for this topic. Therefore, I would default to unitless
(current behaviour) unless NA
is explicitly specified.
In other words, for me, 2
(isolated, by default) has no units. Unless I have more information; unless I know that there is some unit, but I don't know which one it is, so I explicitly should have to specify set_units(2, NA)
.
I agree that would be quite a change, and I was thinking about making this optional. This
> set_units(1, km/h) * 1
1 [km/h]
> set_units(1, km/h) + 1
Error in Ops.units(set_units(1, km/h), 1) :
both operands of the expression should be "units" objects
however annoys me. unitless
is a unit; no unit implies a missing unit. Having the option to catch (and break on) the case where plain numbers are elevated to unitless
would be good. See here.
unitless
is a programming construct that is needed to deal with a programming issue: the case of losing units as a result of operations with units that cancel out. So it is not a unit, conceptually; it literally means no units.
Missing implies different things. Missing means that you know, actually, that there is some unit associated with that data, but it is not known or went missing for some reason. It is quite different to me.
Oh, and I think that the error from set_units(1, km/h) + 1
should be the same as what you get from set_units(1, km/h) + set_units(1, 1)
, i.e.: Error: cannot convert 1 into km/h
.
The conversation morphed a bit. I think that there are a few different items here:
NA_real_
) and known units, like NA km.The first is supported in the current library as is, and it should not change in my opinion.
The second is also already supported in the current library, too, and in my opinion, it should not change.
The third is not supported in the current library, and it supports several operations— mainly around the combination of mixed units. The example I gave above is my typical use case. Another example use case is c()
with a mixed_units object and NA_real_
(or probably any NA).
I don’t think that it’s sensible to have a non-missing magnitude and an NA unit. For example, set_units(2, NA, mode=“standard”)
seems like it should be an error, as it currently is. (There could be a use case that I’m not thinking of, but to me, that is what numeric classes are for.)
Automatically closed by 8c6bda8, which doesn't allow missing units, so I'm reopening this.
I would like to add the following use case to support the implementation of something like NA_unit_
. If I read data in a table in a database, the table may (unfortunately, but that is real life) not be well documented, hence I may know the physical meaning of each column and not know the effective unit of each column. In that case, I would like to specify a missing unit for these columns, while waiting for more information from the database holder.
Dear package maintainers, do you see any chance that this proposal be accepted sooner or later? I would see this notion of missing unit NA_unit
as a valuable feature for this package, and it seems to me that the implementation would not be too difficult.
In my view it should first be clear that:
Now, say we have a vector x
(unitless, or with a non-missing unit, or with a missing unit) and a vector y
with missing unit. Then:
x * y
, y * x
, x/y
, y/x
, x^y
, y^x
should have a missing unit; x + y
and x - y
should not be allowed, since without a known unit we don't know whether x
and y
have the same physical dimension. This is still open for two main reasons:
However, it's easier to gather consensus around something that works, so we could use a PR to refine the proposal. :)
I'm coming back to this issue as I'm working to develop some additional unit conversion tools for laboratory data. I ahve a thought about implementation. What if we define a unit during package load called "NAunits" as a new base unit:
While that would not require a specific magnitude, it would be a simple way to accomplish the goal of a unit value that is NA. Would that be an acceptable method? The downside I see to the above is that it would be confusing because there would be a unit named "NAunits" and a constant defined named "NAunits", but it would still work as expected, as far as I can see:
The next question would be: How should we handle calls to is.na()
? Currently, only the magnitude part appears to be examined. Should we define is.na.units()
to look for the units to be NA_units_
in addition to an NA magnitude? I'd suggest that we simply keep using the magnitude check and ignore the units part for is.na()
.
library(units)
#> udunits database from C:/Users/Bill Denney/Documents/R/win-library/4.1/units/share/udunits/udunits2.xml
install_unit(symbol="NA_units_")
NA_units_ <- set_units(NA_real_, "NA_units_", mode="standard")
set_units(1, "NA_units_")
#> 1 [NA_units_]
set_units(2, NA_units_)
#> 2 [NA_units_]
Created on 2021-10-13 by the reprex package (v2.0.1)
I just had a bug in my code, but it was hard to track down due to an unclear error.
What I just did was, in essence, the following:
I thought that I was getting the error because I wasn't using
mode="standard"
, but it really was due to theNA_character_
.I have two thoughts:
x
andvalue
options toset_units
areNA
, couldNA
be returned? (That could easily be a bad idea for several reasons, and I don't have a strong opinion if the answer is a resounding no.)Error: is.language(x) is not TRUE
error be clarified? I thought I was making it clear that I wasn't sending in a language object when I gaveNA_character_
, so the error didn't make sense to me. (Admittedly, I wasn't intentionally sending inNA_character_
.)A simple fix for option 2 above could possibly be changing the
stopifnot(is.character(x), length(x) == 1)
at the top of as_units.character tostopifnot(is.character(x), length(x) == 1, is.na(x))
.