ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
98 stars 33 forks source link

multiple geographicCoverage elements don't make it into written EML #265

Open atn38 opened 5 years ago

atn38 commented 5 years ago

Hey @cboettig,

I've got a EML list object created in EML v 1.99.0 with multiple geographicCoverage elements (more than 2 in fact):

EML$dataset$coverage$geographicCoverage

$geographicCoverage
$geographicCoverage$`1`
$geographicCoverage$`1`$geographicDescription
[1] "Arroyo Burro reef: Arroyo Burro Reef is located on the Santa Barbara Channel near the mouth of Arroyo Burro Creek and Beach. Depth ranges from 5.4 to 7 meters."

$geographicCoverage$`1`$boundingCoordinates
$geographicCoverage$`1`$boundingCoordinates$westBoundingCoordinate
[1] "-119.822502"

$geographicCoverage$`1`$boundingCoordinates$eastBoundingCoordinate
[1] "-119.822502"

$geographicCoverage$`1`$boundingCoordinates$northBoundingCoordinate
[1] "34.4138298"

$geographicCoverage$`1`$boundingCoordinates$southBoundingCoordinate
[1] "34.4138298"

$geographicCoverage$`2`
$geographicCoverage$`2`$geographicDescription
[1] "Arroyo Hondo Reef: Arroyo Hondo Reef is located on the Santa Barbara Channel near the east end of Gaviota State Park, CA. Depth ranges from -4.3m to -6.6 meters. "

$geographicCoverage$`2`$boundingCoordinates
$geographicCoverage$`2`$boundingCoordinates$westBoundingCoordinate
[1] "-120.144402"

$geographicCoverage$`2`$boundingCoordinates$eastBoundingCoordinate
[1] "-120.144402"

$geographicCoverage$`2`$boundingCoordinates$northBoundingCoordinate
[1] "34.4724007"

$geographicCoverage$`2`$boundingCoordinates$southBoundingCoordinate
[1] "34.4724007"

write_eml(EML, file = "file.xml") however only takes the last element in list above and outputs

<geographicCoverage>list(eastBoundingCoordinate = "-119.7155", northBoundingCoordinate = "34.0444984", southBoundingCoordinate = "34.0444984", westBoundingCoordinate = "-119.7155")Santa Cruz Island, Twin Harbor West reef: Twin Harbor West Reef is located on the north shore of Santa Cruz Island, in the Santa Barbara Channel Islands, CA. Depth ranges from -3.0 to -15 meters. Twin Harbor West Reef is located on the north shore of Santa Cruz Island, in the Santa Barbara Channel Islands, CA. Depth ranges from -3.0 to -15 meters. </geographicCoverage>

Edit: on closer inspection, in fact this happens whenever there are named lists assigned to an element (in this case just 1, 2, etc). I've got the same issue with the creator element, while unitList with unnamed list shows up fine.

jeanetteclark commented 5 years ago

Hi @atn38

EML v 1.99.0 relies on nested, named lists to structure a valid EML document. I haven't reproduced your example, but I imagine adding your own named lists that are not within the EML schema to your EML document is confusing the parser that converts your list of EML to valid xml. You can add multiple geographic coverages as an unnamed list just fine (same with creator).

You can confirm that your document is valid before writing it by using the function eml_validate if you are not doing so already

atn38 commented 5 years ago

Thanks @jeanetteclark.

setting names to NULL for the list that gets assigned to geographicCoverage solves the problem. Didn't seem like anything to do with my own list or EML schema, but that some prior processing left a named list. eml_validate didn't mention anything relevant to this issue.

jeanetteclark commented 5 years ago

I'm glad you found a solution!

Regarding the schema - what you have done by adding an arbitrarily named list (your list of geo coverages) into your EML is generate schema-invalid metadata. eml_validate will catch this and show the validation errors. Below I included an MRE of your issue showing how eml_validate catches it:

library(EML)

contacts <- list(individualName = list(givenName = "Jeanette", surName = "Clark"))

covs <- list(`1` = list(geographicDescription = "description one", 
                                        boundingCoordinates = list(westBoundingCoordinate = 120,
                                                                                      eastBoundingCoordinate = 121,
                                                                                      northBoundingCoordinate = 12,
                                                                                      southBoundingCoordinate = 13)),
             `2` = list(geographicDescription = "description two", 
                        boundingCoordinates = list(westBoundingCoordinate = -120,
                                                   eastBoundingCoordinate = -121,
                                                   northBoundingCoordinate = -12,
                                                   southBoundingCoordinate = -13)))

my_eml <- list(packageId = "id", system = "system",
    dataset = list(
    title = "A Mimimal Valid EML Dataset",
    creator = contacts,
    contact = contacts,
    coverage = list(geographicCoverage = covs))
)

eml_validate(my_eml)
#> [1] FALSE
#> attr(,"errors")
#> [1] "Element 'geographicCoverage': Character content other than whitespace is not allowed because the content type is 'element-only'."
#> [2] "Element 'geographicCoverage': Missing child element(s). Expected is one of ( geographicDescription, references )."