ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
97 stars 33 forks source link

Duplicate person when using `write_eml()` #339

Open peterdesmet opened 2 years ago

peterdesmet commented 2 years ago

The Creating EML vignette suggests using as_emld(R_person) to efficiently code a person as an EML party.

library(EML)
me <- person("Peter", "Desmet", , "fakeaddress@email.com", "mdc", comment = c(ORCID = "0000-0002-8442-8025"))
my_eml <- list(dataset = list(
  title = "A Minimal Valid EML Dataset",
  creator = as_emld(me),
  contact = as_emld(me)
))
my_eml
#> $dataset
#> $dataset$title
#> [1] "A Minimal Valid EML Dataset"
#> 
#> $dataset$creator
#> individualName:
#>   givenName: Peter
#>   surName: Desmet
#> electronicMailAddress: fakeaddress@email.com
#> '@id': https://orcid.org/0000-0002-8442-8025
#> 
#> $dataset$contact
#> individualName:
#>   givenName: Peter
#>   surName: Desmet
#> electronicMailAddress: fakeaddress@email.com
#> '@id': https://orcid.org/0000-0002-8442-8025
write_eml(my_eml, "ex.xml")

Created on 2022-04-29 by the reprex package (v2.0.1)

That generated EML does indeed contain that info nicely. However, the written EML contains the individualName twice:

<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" packageId="4ef7c004-cb89-4888-b095-240ecbf18c28" system="uuid" xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd">
  <dataset>
    <title>A Minimal Valid EML Dataset</title>
    <creator id="https://orcid.org/0000-0002-8442-8025">
      <individualName>
        <givenName>Peter</givenName>
        <surName>Desmet</surName>
      </individualName>
      <individualName> <-- DUPLICATE
        <givenName>Peter</givenName>
        <surName>Desmet</surName>
      </individualName>
      <electronicMailAddress>fakeaddress@email.com</electronicMailAddress>
    </creator>
    <contact id="https://orcid.org/0000-0002-8442-8025">
      <individualName>
        <givenName>Peter</givenName>
        <surName>Desmet</surName>
      </individualName>
      <individualName> <-- DUPLICATE
        <givenName>Peter</givenName>
        <surName>Desmet</surName>
      </individualName>
      <electronicMailAddress>fakeaddress@email.com</electronicMailAddress>
    </contact>
  </dataset>
</eml:eml>

Any idea why this is happening? Note, it is not happening when:

cboettig commented 2 years ago

Thanks for reporting. @amoeba may be able to shed more light here, but my shot-in-the-dark is that it's related to the fact that we parse the ORCID identifier as the id to the block (I think when you re-use an element in EML you really want to use a reference and not repeat the element like we do in the example; but it's only really an issue when the element has an id)

e.g. can you try the above but without a comment element on the person the used in the contact field? (I could be entirely wrong here too)

peterdesmet commented 2 years ago

Hmm, yes, if I try:

me <- person("Peter", "Desmet", , "fakeaddress@email.com", "mdc")
my_eml <- list(dataset = list(
  title = "A Minimal Valid EML Dataset",
  creator = as_emld(me),
  contact = as_emld(me)
))
write_eml(my_eml, "ex.xml")

It doesn't get duplicated.

Too bad, it was pretty useful that I could use as_emld() on person. I guess I'll have to parse those out and feed them to set_responsibleParty() where I specifically assign each property?