ropensci / emld

:package: JSON-LD representation of EML
https://docs.ropensci.org/emld
Other
13 stars 6 forks source link

Change default schemaLocation so it points at a URL #45

Closed amoeba closed 4 years ago

amoeba commented 4 years ago

Closes #44

Couple of things to look at here:

amoeba commented 4 years ago

Hey @cboettig:

I think I'm fine with either setting schemaLocation or not. I see adding it as a convenience: When set, my text editors automatically validates as I type and shows in-context errors. Also when set, my command line validation tools work out of the box without editing the file. These conveniences are nice when working with a document here or there but add up when doing this all the time.

Considering the other two points you bring up, I can get onboard with not setting schemaLocation by default and fixing module-level validation as you suggest to make things more consistent. My above convenience points are mitigated in a world where EML uses an https URI for the namespace which directs people to a copy of the schema so they can set schemaLocation on their own if needed.

I'm happy to close this PR if we agree.

cboettig commented 4 years ago

You make a good case for setting it as being 'good practice' even though it's not required. I think we should set it by default then.

Let's change the PR to

Does that make sense?

Not sure if eml_host is a good name for the option or if we should prefer something more explicit to schemaLocation? Also not sure if doing this through options is really preferable or if this deserves being exposed through user-facing function arguments (possibly just via ... ?) thoughts?

amoeba commented 4 years ago

I wanna think through this a bit...

The user-facing interface for setting schemaLocation currently is setting it on in the list representing the document they're generating/editing. I think this is good and should be the way because the list object is the interface the user is already working with. I think the question here is what we do when it's not set.

So if it's not set, we automatically generate a value for them which would be a side-effect and should be overridable. Good so far. (Aside: Should it be overridable in EML _and) emld? EML's write_eml currently doesn't have a dedicated argument for this but it has other arguments that are similar to schemaLocation.)

This makes me want to add support for an option like schemaLocation which is just the full attribute value so a user can set it permanently via something like .REnviron and can easily include multiple mappings (not just one). I think my current eml_schema_location is too modular and too magical to be user-facing. You rightly noticed this with calling out eml_host as maybe not being the best way.

cboettig commented 4 years ago

Yup, good point! I agree the user way to set it should be in the list construction (or also in an argument to a dedicated constructor function, e.g. the proposed set_eml() function for packageId could take an argument). I agree that doing it in options as well is kinda too magical.

I guess the only trick is that it's a bit less obvious how to have a default value that the user can override with NULL than it is to do the logic for a default NULL / omitted value that the user can override by supplying one. I guess one option is to use NA instead of NULL, NA would mean "omit the schemaLocation" while not setting one (which is the same thing as setting the list element to NULL) would mean "use the default schemaLocation". How's that sound?

amoeba commented 4 years ago

Hrm, at the risk of discussing this to death, what about TRUE|FALSE|""?

This could be exposed at the EML::write_eml boundary and bubbled down into as_xml*.

If you aren't a fan, I am happy to do NULL|NA like above.

cboettig commented 4 years ago

@amoeba thanks, yeah, I like your proposal better, I think it's more intuitive than NA vs NULL. An update with that would be great!

Also if you have a chance, maybe you want to take a whack at the validator code to remove any use of xsi:schemaLocation on that end as well? either as part of this PR or separately would be great.

amoeba commented 4 years ago

I refactored the schemaLocation argument as discussed above. Please take a look. I'll move on to looking at excising schemaLocation from validation logic.

Edit: And will send a separate PR for that.

cboettig commented 4 years ago

thanks for this, current approach looks great!