Open amoeba opened 4 years ago
While I fully appreciate the motivation for providing a packageId
automatically, I am still uncomfortable with doing so in write_eml
, which is meant to simply be a serialization of an in-memory object. Side-effects that aren't in user control often cause problems, and I think what is written to disk should be round-trippable with the in-memory object.
Instead of changing content, couldn't we simply provide warnings for any validation errors (including a missing packageId) so the user will know that their document doesn't validate when it is written to disk? This would allow folks that want to export a partial document for further downstream processing to do so, while still helping to prompt when required fields are missing.
@mbjones Yeah, I'm with you :100: on that.
I think we should also add a helper routine to add_packageId()
that could default to the current UUID method to avoid the possible complexity associated with this task. That would be far more transparent than doing it on write
.
Perhaps we can also add some additional logic into the validator which would report that the EML was valid except for a missing packageId
(possibly with a suggestion on how to add one?, e.g.)
EML file is valid except for a missing packageId. You can create one now with `add_packageId()`
I've always been a bit annoyed at the standard XML validation errors not being as user-friendly or comprehensive as they should be.
Thoughts?
Sounds great!
Stemming from #292,
Right now, when an
eml
object is written to disk withwrite_eml
without apackageId
,write_eml
fills inpackageId
with a UUID and setssystem
touuid
:https://github.com/ropensci/EML/blob/6c2911c4001a60e9a838059ec3c0c8fd7018f6a2/R/write_eml.R#L27-L31
In #292, this created some understandable confusion for @scelmendorf because it meant that
eml_validate
's behavior was inconsistent depending on whether the object was written to disk first or not.In #292, @cboettig wrote:
Also in #292, @mobb wrote:
I agree with the above and it seems we have a consensus but I wanted to file a standalone issue for discussion in case it was needed. Open to comments or suggested wording of the warnings/messages.
I propose:
packageId
is not set whenwrite_eml
is called, issue awarning
.packageId
generated automatically because it was not already set. See ?packageId for more information."?packageId
with more information. TBD. It'd explain how to setpackageId
andsystem
and how to come up with sensible values.