Closed jagoldstein closed 5 years ago
I wasn't able to reproduce due to what appears to be a permissions issue, but I could reproduce it with this MRE:
library(EML)
eml <- read_eml(system.file("xsd/test/eml-datasetWithAttributelevelMethods.xml", package = "EML"))
attrs <- get_attributes(eml@dataset@dataTable[[1]]@attributeList)
attrs$attributes[4,"unit"] <- "hectopascal"
set_attributes(attrs$attributes)
After running the code and looking at the warning, I actually prefer the warning as-is. Curious to hear what @cboettig thinks.
Thanks both!
@amoeba side-note, looks like you're still running the S4 EML, in 2.0 your example would be:
library(EML)
eml <- read_eml(system.file("tests/eml-2.1.1/eml-datasetWithAttributelevelMethods.xml", package = "emld"))
attrs <- get_attributes(eml$dataset$dataTable$attributeList)
attrs$attributes[4,"unit"] <- "hectopascal"
set_attributes(attrs$attributes)
Which produces the warning @jagoldstein mentions above. Of course in this case we indeed have not defined hectopascal
yet so the warning is appropriate, but the more general question I think still stands: in the current design, set_attributes
has no access to or knowledge of what customUnits, if any, have not been set. Setting aside the tricky issue of whether set_attributes
ought to be able to check the user's customUnit list first, I agree the warning could be more conditional, i.e.
Please be sure you also define a custom unit in your EML file, or replace with a recognized standard unit. See set_unitList() for details.
instead of
Please define a custom unit or replace with a
recognized standard unit (see set_unitList() for details)
(I would say "be sure you have defined", but using the past tense gives the suggestion that the set_attributes
function may not work properly if you haven't already defined the unit, which is not the case.)
Also raises a more general question: I don't think failing to define custom units causes EML validation to fail? Bryce? Maybe it should?
Great points @cboettig . I think Please be sure a custom unit is defined
might be less confusing than Please be sure you also define...
which could be interpreted as "do this because it's known that you haven't yet".
Alternatively, the message could include a disclaimer You may have already done so
, but that seems too verbose.
I don't think failing to define custom units causes EML validation to fail? Bryce? Maybe it should?
IIRC this does cause a validation error. Right, @jagoldstein ?
EML::eml_validate()
does NOT fail due to custom units.
But there is a failure when arcticdatautils::publish_update()
or arcticdatautils::publish_object
are used to publish an EML with custom units that are undefined.
Output example:
Error in xml document. This EML instance is invalid because referenced id hectopascal does not exist in the given keys.</error>
.
Okay yeah. So that matches my understanding. Missing custom units makes for invalid EML but not necessarily schema-invalid EML which is all the EML package checks. The EML
package's validation routine should be checking more than just schema validity which means checking custom attributes.
@amoeba I think the eml_validate()
function should check for EML validation as defined by the EML spec, e.g. https://github.com/NCEAS/eml/blob/BRANCH_EML_2_2/docs/eml-validation-refs.md, which is more than just checking schema validation, but which does not (afaik) involve any checks to confirm that a custom unit is defined in the file. (I have an open PR on emld
which adds the additional checks described in the spec, see https://github.com/cboettig/emld/pull/26).
I'm all for additional quality checks as separate functions (in articdatautils
, EML
or both) but don't think we should include checks in eml_validate
that are not dictated by the EML specification -- we can't have inconsistent notions of what it means to be valid!
@mbjones thoughts on this? Is it worth including checks that any custom unit is defined as part of the validity specification?
Totally agree. I imagine @mbjones meant to include that in the spec since we enforce it through our EML own validator.
@amoeba um, do you remember where we stand on this one?
skimming the above it sounds like eml_validate()
is behaving as we want it to (insisting custom units be defined), so I think we can close this?
Yes, unless we want to massage the wording a bit.
I'd lobby, and can PR this if you 👍. (I wouldn't be miffed if the code stayed as-is):
> attributeList <- set_attributes(attributes = foo$attributes)
Warning message:
In set_attribute(attributes[i, ], factors = factors) :
Unit 'hectopascal' is not a recognized standard unit; treating as a custom unit. Please be sure you also define a custom unit in your EML record, or replace with a recognized standard unit. See set_unitList() for details.`
Cool, that makes sense. Happy to have a PR to fix the wording
PR'd.
Note this warning output by
EML::set_attributes()
:The warning instructs, "unit 'hectopascal' is not recognized, using custom unit. Please define a custom unit or..." but in this case the custom unit was already defined for 'hectopascal'.
This messaging should be adjusted so that it is not so confidently assertive. Something along the lines of
Please ensure that it has a custom unit definition
. OrThis should be defined as a custom unit if it is not already.
Here is my MRE:
Here is my session info: