ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
98 stars 33 forks source link

version 1.0.3 `read_eml` removes text if there are text formatting tags in a para with no section #259

Closed jeanetteclark closed 5 years ago

jeanetteclark commented 5 years ago

As the title says, read_eml from version 1.0.3 of the package will remove text if there are text formatting tags (like subscript) in a para element that is not wrapped in a section. Here is an MRE using two EML documents on dev.nceas.ucsb.edu

library(dataone)
library(EML) # version 1.0.3

cn_staging <- CNode('STAGING2')
knb_test <- getMNode(cn_staging,'urn:node:mnTestKNB')

eml_no_section <- read_eml(getObject(knb_test, "urn:uuid:ed5b54b3-58b0-46f9-ba9b-40b8f3b880b8"))
eml_no_section@dataset@methods
#> <methods>
#>   <methodStep>
#>     <description>
#>       <para>
#>         <subscript>2</subscript>
#>       </para>
#>     </description>
#>   </methodStep>
#> </methods>

eml_section <- read_eml(getObject(knb_test, "urn:uuid:e7e34b8c-2747-40bf-89f2-d508383eec49"))
eml_section@dataset@methods
#> <methods>
#>   <methodStep>
#>     <description>
#>       <section>
#>         <para>some methods say that CO <subscript>2</subscript> needs special formatting</para>
#>       </section>
#>     </description>
#>   </methodStep>
#> </methods>

To me, this does not appear to be a problem using EML 1.99.0, but it might warrant a closer look.

library(dataone)
library(EML) # version 1.99.0

cn_staging <- CNode('STAGING2')
knb_test <- getMNode(cn_staging,'urn:node:mnTestKNB')

eml_no_section <- read_eml(getObject(knb_test, "urn:uuid:ed5b54b3-58b0-46f9-ba9b-40b8f3b880b8")) 
eml_no_section$dataset$methods

#>  $methodStep
#>  $methodStep$description
#>  $methodStep$description$para
#>  [1] "some methods say that CO \n<subscript>2</subscript>\n needs special formatting"

Thanks to @dmullen17 and @mbjones for helping to sleuth this one out

cboettig commented 5 years ago

Closing as this seems to be resolved in the current version, but lemme know if I'm missing something