Closed amoeba closed 4 years ago
Hey @cboettig:
I think I'm fine with either setting schemaLocation
or not. I see adding it as a convenience: When set, my text editors automatically validates as I type and shows in-context errors. Also when set, my command line validation tools work out of the box without editing the file. These conveniences are nice when working with a document here or there but add up when doing this all the time.
Considering the other two points you bring up, I can get onboard with not setting schemaLocation
by default and fixing module-level validation as you suggest to make things more consistent. My above convenience points are mitigated in a world where EML uses an https
URI for the namespace which directs people to a copy of the schema so they can set schemaLocation
on their own if needed.
I'm happy to close this PR if we agree.
You make a good case for setting it as being 'good practice' even though it's not required. I think we should set it by default then.
Let's change the PR to
option
and accessed by getOption()
, like we do in eml_version()
function. eml_host
to NULLDoes that make sense?
Not sure if eml_host
is a good name for the option or if we should prefer something more explicit to schemaLocation? Also not sure if doing this through options
is really preferable or if this deserves being exposed through user-facing function arguments (possibly just via ... ?) thoughts?
I wanna think through this a bit...
The user-facing interface for setting schemaLocation
currently is setting it on in the list
representing the document they're generating/editing. I think this is good and should be the way because the list
object is the interface the user is already working with. I think the question here is what we do when it's not set.
So if it's not set, we automatically generate a value for them which would be a side-effect and should be overridable. Good so far. (Aside: Should it be overridable in EML
_and) emld
? EML
's write_eml
currently doesn't have a dedicated argument for this but it has other arguments that are similar to schemaLocation
.)
This makes me want to add support for an option
like schemaLocation
which is just the full attribute value so a user can set it permanently via something like .REnviron
and can easily include multiple mappings (not just one). I think my current eml_schema_location
is too modular and too magical to be user-facing. You rightly noticed this with calling out eml_host
as maybe not being the best way.
Yup, good point! I agree the user way to set it should be in the list construction (or also in an argument to a dedicated constructor function, e.g. the proposed set_eml()
function for packageId
could take an argument). I agree that doing it in options as well is kinda too magical.
I guess the only trick is that it's a bit less obvious how to have a default value that the user can override with NULL
than it is to do the logic for a default NULL
/ omitted value that the user can override by supplying one. I guess one option is to use NA
instead of NULL
, NA
would mean "omit the schemaLocation" while not setting one (which is the same thing as setting the list element to NULL) would mean "use the default schemaLocation". How's that sound?
Hrm, at the risk of discussing this to death, what about TRUE|FALSE|""
?
TRUE
-> Automatically set schemaLocation
to a helpful eml.ecoinformatics.org locationFALSE
-> Do not set schemaLocation
""
-> Set this schemaLocation
. This is kinda like the col_names
argument in read_*
in readr
.This could be exposed at the EML::write_eml
boundary and bubbled down into as_xml*
.
If you aren't a fan, I am happy to do NULL|NA
like above.
@amoeba thanks, yeah, I like your proposal better, I think it's more intuitive than NA
vs NULL
. An update with that would be great!
Also if you have a chance, maybe you want to take a whack at the validator code to remove any use of xsi:schemaLocation
on that end as well? either as part of this PR or separately would be great.
I refactored the schemaLocation argument as discussed above. Please take a look. I'll move on to looking at excising schemaLocation from validation logic.
Edit: And will send a separate PR for that.
thanks for this, current approach looks great!
Closes #44
Couple of things to look at here:
eml_version.R
?