This is fairly odd behavior and may be specific to older EML files. I suspect it has to do with the eml_validate() function not being backwards-compatible with EML schema 2.1.1. But specifying schema 2.1.1 changes - but does not solve - the problem.
I downloaded an older data package with metadata built under EML 2.1.1. I checked the validity of the EML file using https://knb.ecoinformatics.org/emlparser and found that it passed both XML and EML specific tests. I then read the file in to R. EML::eml_validate() found that it contained invalid EML. Thus, when I wrote it back to .xml it re-arranged some aspects of the original EML file. When I re-ran the parser tests at knb.econinformatics.org, the newly exported file failed the XML-specific tests. Is the EML package introducing invalid xml into (valid?) EML-formatted .xml files?
I read the file in to R using EML::read_eml(), checked to see whether it validated using EML::eml_validate() (with schemas 2.1.1 and 2.2.0) and then wrote it back to xml using EML::write_eml():
The EML does not validate using schema 2.2.0. Perhaps this is not unexpected, given it was created under 2.1.1.:
EML::eml_validate(mymeta)
[1] FALSE
attr(,"errors")
[1] "Element 'boundingCoordinates': This element is not expected. Expected is one of ( geographicDescription, references )."
(and 17 additional identical errors are listed)
In this case the EML doesn't validate, but it appears that the problem is despite switching to schema 2.1.1, the eml_validate function is still checking against version 2.2.0, but it does not seem to have problems with the geography (or is simply not reporting them?).
In any case, I can then write the object back to .xml:
EML::write_eml(mymeta, "exportedEML.xml")
The newly exported "exportedEML.xml" file now contains the namespace conflicts described in issue #347, despite having specified that the EML 2.1.1 schema should be used prior to calling the EML::write_eml function
When I now check the exportedEML.xml file using the EML parser at https://knb.ecoinformatics.org/emlparser/ I find that although it passes EML-specific tests, it fails XML-specific tests:
XML specific tests: Failed
The following errors were found:
cvc-complex-type.2.4.a: Invalid content was found starting with element 'boundingCoordinates'. One of '{geographicDescription, references}' is expected.
Has the EML package introduced invalid XML into the file?
Further comparisons of the .xml files indicates that various elements within the original knb-lter-and.4780.4.xml have been re-arranged compared to the exportedEML.xml file. Specifically, in the original knb file, there are 18 elements listed under with the following general format:
As you can see, the children of have been re-arranged in alphabetical order, which seems to be the default approach for EML::write_eml when handling invalid EML. Except in this case, was the EML invalid? knb's EML parser says it was valid. If the original file was valid EML, then the EML package appears to be taking valid EML and turning it into an invalid format that does not pass XML tests (or the EML::eml_validate test). Either way, I would not expect reading and then writing a (valid?) EML file to introduce these sorts of changes.
This is fairly odd behavior and may be specific to older EML files. I suspect it has to do with the eml_validate() function not being backwards-compatible with EML schema 2.1.1. But specifying schema 2.1.1 changes - but does not solve - the problem.
I downloaded an older data package with metadata built under EML 2.1.1. I checked the validity of the EML file using https://knb.ecoinformatics.org/emlparser and found that it passed both XML and EML specific tests. I then read the file in to R. EML::eml_validate() found that it contained invalid EML. Thus, when I wrote it back to .xml it re-arranged some aspects of the original EML file. When I re-ran the parser tests at knb.econinformatics.org, the newly exported file failed the XML-specific tests. Is the EML package introducing invalid xml into (valid?) EML-formatted .xml files?
I downloaded the following data package: https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-and.4780.4 and ran the file, "knb-lter-and.4780.4.xml" through the EML parser at https://knb.ecoinformatics.org/emlparser/. The file passed both XML-specific and EML-specific tests.
I read the file in to R using EML::read_eml(), checked to see whether it validated using EML::eml_validate() (with schemas 2.1.1 and 2.2.0) and then wrote it back to xml using EML::write_eml():
The EML does not validate using schema 2.2.0. Perhaps this is not unexpected, given it was created under 2.1.1.:
Switched to schema 2.1.1:
In this case the EML doesn't validate, but it appears that the problem is despite switching to schema 2.1.1, the eml_validate function is still checking against version 2.2.0, but it does not seem to have problems with the geography (or is simply not reporting them?).
In any case, I can then write the object back to .xml:
The newly exported "exportedEML.xml" file now contains the namespace conflicts described in issue #347, despite having specified that the EML 2.1.1 schema should be used prior to calling the EML::write_eml function
When I now check the exportedEML.xml file using the EML parser at https://knb.ecoinformatics.org/emlparser/ I find that although it passes EML-specific tests, it fails XML-specific tests:
Has the EML package introduced invalid XML into the file?
Further comparisons of the .xml files indicates that various elements within the original knb-lter-and.4780.4.xml have been re-arranged compared to the exportedEML.xml file. Specifically, in the original knb file, there are 18 elements listed under with the following general format:
Whereas in the exportedEML.xml file, the corresponding elements have the following arrangement:
As you can see, the children of have been re-arranged in alphabetical order, which seems to be the default approach for EML::write_eml when handling invalid EML. Except in this case, was the EML invalid? knb's EML parser says it was valid. If the original file was valid EML, then the EML package appears to be taking valid EML and turning it into an invalid format that does not pass XML tests (or the EML::eml_validate test). Either way, I would not expect reading and then writing a (valid?) EML file to introduce these sorts of changes.