Open atn38 opened 5 years ago
Hi An,
thanks, keep me posted. Are you getting an invalid order (e.g. with errors from eml_validate()
) or just a different order?
That's the puzzling thing I forgot to include. eml_validate() gives error on a seemingly correct ordered EML list
[1] "Element 'abstract': This element is not expected. Expected is one of ( references, alternateIdentifier, shortName, title )."
Can you give us an MRE for that? The XML validator error messages are not always super helpful, that error often doesn't have anything to do with order per se... could be missing required element I think.
I've been trying to, but can't replicate error with MRE. here's the actual EML object I have trouble with -- change extension to .RData example_EML.txt
It looks like that particular validation error is because you have a typo in your shortName
element. You have it listed as shortname
Thanks Jeanette! Quite embarrassing.
Did you try write_eml() on examlple_EML? Did it order elements in dataset alphabetically? That was the downstream issue that prompted me looking.
I've spent a LOT of time over the past two years diagnosing EML validation errors, so no need to be embarrassed :) eventually you get a bit of an eye for it
I did run write_eml
on your example and I saw the same behavior you did. In general though, write_eml
does some strange things when it writes invalid EML (your other issue has another good example), so to me the behavior isn't unexpected. Once your EML is valid write_eml
will put the elements in the correct order.
It seems eml_validate
returns only the first major issue it encounters: once I fixed shortName
typo, it returned one other similar issue deeper in, with similar write_eml
behavior.
So in short: if an element is misnamed, write_eml
, for some reason, orders tags alphabetically within the parent element.
Jeanette, do you recommend reporting other issues with write_eml
if it's known to be erratic? I seem to keep running into them and can't self-diagnose easily with eml_validate
Well, I wouldn't call write_eml
erratic at all. When EML is valid, it behaves exactly as expected as far as I've seen. As a very hand-wavey explanation (others like Bryce or Carl could give more technical details): The behaviors you are seeing exist because write_eml
doesn't know how to handle elements that are not in the schema, but it will still try to write the document (in fact, eml_validate
writes a document behind the scenes that is then checked against the schema).
Whether you get a bunch of errors with eml_validate
at once, or one after another once you fix one, I think will depend on where the error occurs. I'd be curious to see what your workflow is looking like for generating your EML documents. Are you using the eml$...()
helpers? These are really useful so that you don't have to remember the whole schema (and help avoid typos!) We have an #eml channel over on the NCEAS slack (join at this link: https://slack.nceas.ucsb.edu/) that might be a good forum for more discussion
Jeanette, thanks for the explanation! Valid EML means happiness. That will be a motto. Do I need an invite from NCEAS for slack group?
I'm working on R functions to read in from a Postgres DB, which is a common metadata catalog the LTER network is working on. Here's the proto package repo for the functions https://github.com/atn38/rMetabase2eml. Workflow is little of assigning elements by hand, lots of apply functions and figuring out correct list structure for valid EML in different scenarios (tough!). I haven't used the helpers yet, but do plan to later.
No need for an invite, just use this link to join: https://slack.nceas.ucsb.edu/
Looks like a cool project and a great use of the EML package!
I'm using EML 1.99.0.
When making list for dataset object, I assign elements in order recommended by EML best practices. Resulting dataset list has correct order, and so does the EML list.
However when passed into write_eml(), in XML file has its elements in alphabetical order.
not yet able to reproduce this in MRE, will update if positive