ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
97 stars 33 forks source link

write_eml() orders elements in <dataset> alphabetically #266

Open atn38 opened 5 years ago

atn38 commented 5 years ago

I'm using EML 1.99.0.

When making list for dataset object, I assign elements in order recommended by EML best practices. Resulting dataset list has correct order, and so does the EML list.

However when passed into write_eml(), in XML file has its elements in alphabetical order.

not yet able to reproduce this in MRE, will update if positive

cboettig commented 5 years ago

Hi An,

thanks, keep me posted. Are you getting an invalid order (e.g. with errors from eml_validate()) or just a different order?

atn38 commented 5 years ago

That's the puzzling thing I forgot to include. eml_validate() gives error on a seemingly correct ordered EML list [1] "Element 'abstract': This element is not expected. Expected is one of ( references, alternateIdentifier, shortName, title )."

cboettig commented 5 years ago

Can you give us an MRE for that? The XML validator error messages are not always super helpful, that error often doesn't have anything to do with order per se... could be missing required element I think.

atn38 commented 5 years ago

I've been trying to, but can't replicate error with MRE. here's the actual EML object I have trouble with -- change extension to .RData example_EML.txt

jeanetteclark commented 5 years ago

It looks like that particular validation error is because you have a typo in your shortName element. You have it listed as shortname

atn38 commented 5 years ago

Thanks Jeanette! Quite embarrassing.

Did you try write_eml() on examlple_EML? Did it order elements in dataset alphabetically? That was the downstream issue that prompted me looking.

jeanetteclark commented 5 years ago

I've spent a LOT of time over the past two years diagnosing EML validation errors, so no need to be embarrassed :) eventually you get a bit of an eye for it

I did run write_eml on your example and I saw the same behavior you did. In general though, write_eml does some strange things when it writes invalid EML (your other issue has another good example), so to me the behavior isn't unexpected. Once your EML is valid write_eml will put the elements in the correct order.

atn38 commented 5 years ago

It seems eml_validate returns only the first major issue it encounters: once I fixed shortName typo, it returned one other similar issue deeper in, with similar write_eml behavior.

So in short: if an element is misnamed, write_eml, for some reason, orders tags alphabetically within the parent element.

Jeanette, do you recommend reporting other issues with write_eml if it's known to be erratic? I seem to keep running into them and can't self-diagnose easily with eml_validate

jeanetteclark commented 5 years ago

Well, I wouldn't call write_eml erratic at all. When EML is valid, it behaves exactly as expected as far as I've seen. As a very hand-wavey explanation (others like Bryce or Carl could give more technical details): The behaviors you are seeing exist because write_eml doesn't know how to handle elements that are not in the schema, but it will still try to write the document (in fact, eml_validate writes a document behind the scenes that is then checked against the schema).

Whether you get a bunch of errors with eml_validate at once, or one after another once you fix one, I think will depend on where the error occurs. I'd be curious to see what your workflow is looking like for generating your EML documents. Are you using the eml$...() helpers? These are really useful so that you don't have to remember the whole schema (and help avoid typos!) We have an #eml channel over on the NCEAS slack (join at this link: https://slack.nceas.ucsb.edu/) that might be a good forum for more discussion

atn38 commented 5 years ago

Jeanette, thanks for the explanation! Valid EML means happiness. That will be a motto. Do I need an invite from NCEAS for slack group?

I'm working on R functions to read in from a Postgres DB, which is a common metadata catalog the LTER network is working on. Here's the proto package repo for the functions https://github.com/atn38/rMetabase2eml. Workflow is little of assigning elements by hand, lots of apply functions and figuring out correct list structure for valid EML in different scenarios (tough!). I haven't used the helpers yet, but do plan to later.

jeanetteclark commented 5 years ago

No need for an invite, just use this link to join: https://slack.nceas.ucsb.edu/

Looks like a cool project and a great use of the EML package!