Closed karawoo closed 4 years ago
Hey @karawoo, thanks for filing, esp. with your reprex.
Your first example code looks quite reasonable. After a little look into the code, I think I see what's going on. When col_classes
is specified, set_attributes
tries to infer a measurementScale
from the provided value. However, a value of numeric
may map to either ratio
(e.g., e.g. degrees K) or interval
(e.g., degrees C) but infer_domain_scale
assumes numeric
should be ratio
:
This is the best default IMHO but breaks your quite-reasonable code. You can see it working fine when a col_classes
value of ratio
is provided instead of interval
:
library(tidyverse)
library(EML)
test <- tribble(
~attributeName, ~attributeDefinition, ~unit, ~numberType, ~measurementScale,
"degrees", "degrees C", "celsius", "real", "ratio"
)
set_attributes(test, col_classes = "numeric") # No errors
The error comes about when infer_domain_scale
(1) maps your numeric
col class value to ratio
and then compares the inferred value (ratio
) with your provided value in the measurementScale
column of your attributes data.frame
.
Note in my above permalink that @cboettig put a comment indicating to me he might have meant to come back to this at some point. I think set_attributes
could be reworked a bit here but I ran into some questions with my attempted factor about what the desired behavior of this function would be so I thought comment and see what @cboettig thinks.
I think the thing to do is make infer_domain_scale
continue to assume ratio
for numeric
columns but not error when measurementScale
is explicitly set as interval
by changing up the logic a bit.
@amoeba yup, I think your take is spot on; the implementation is assuming ratio
always, when it should be handling cases like Kara's example explicitly. (& wow did I write nice helpful comments: # !
). Happy for a PR!
PR'd in #297. I started out with a refactor of the logic in infer_domain_scale
to support checking against multiple possible values (interval, ratio) but it ended up involving more code than I liked to fit it into how infer_domain_scale
was plumbed so I opted for a lighter patch which just addresses this problem directly. @karawoo 's example is included in the test suite to confirm it's fixed. Let me know what you think @cboettig .
When I try to define attributes with an interval measurement scale, inferring the domain with
col_classes = "numeric"
fails. I believe the following should be valid:numericDomain
seems valid if I specify it explicitly:Created on 2020-02-16 by the reprex package (v0.3.0.9001) I thought these should behave the same, but am I missing something?