ropenscilabs / deposits

R Client for access to multiple data repository services
https://docs.ropensci.org/deposits/
Other
37 stars 3 forks source link

Parse multiple metadata sources in potentially different formats #63

Closed mpadge closed 1 year ago

mpadge commented 1 year ago

Following on from #62. Example is Zenodo "keywords" and "subjects" terms, both of which have to be part of DCMI "subject" field. #62 implemented "anyOf" options for that field like this: https://github.com/ropenscilabs/deposits/blob/3741020bc91029fc7457eba79620b55d31fd7298/inst/extdata/dc/schema.json#L622-L633 meaning that there are two ways to define such arrays:

subject <- list (keywords = list ("one", "two"))

according to the schema chunk shown above, or the generic version which still allows this:

subject <- "## keywords\none\ntwo"

Both of these have to be parsed to generate same result. They also have to potentially be parsed as "multiple sources", so that the following must also acheive the same result of parsing into multiple Zenodo fields, "keywords" and "subjects":

subject <- list (keywords = list ("one", "two"), subjects = list ("this", "that"))
subject <- "## keywords\none\ntwo\n## subjects\nthis, that"
mpadge commented 1 year ago

Those commits implement most of it:

library (deposits)
packageVersion ("deposits")
#> [1] '0.1.1.31'

service <- "zenodo"
metadata <- list (
    title = "New Title",
    abstract = "This is the abstract",
    creator = list (list (name = "A. Person"), list (name = "B. Person")),
    description = paste0 (
        "## description\nThis is the description\n\n",
        "## version\n1.0"
    ),
    subject = "## keywords\none\ntwo, three\n\n## subjects\nthis, that"
)
# The 2 validation calls in main client initialization:
metadata_dcmi <- validate_dcmi_metadata (metadata)
metadata_service1 <- translate_dc_to_service (metadata_dcmi, service = service)
print (metadata_service1)
#> ...
#> 
#> $metadata$keywords
#> $metadata$keywords[[1]]
#> [1] "one"
#> 
#> $metadata$keywords[[2]]
#> [1] "two"
#> 
#> $metadata$keywords[[3]]
#> [1] "three"
#> 
#> 
#> $metadata$subjects
#> $metadata$subjects[[1]]
#> [1] "this"
#> 
#> $metadata$subjects[[2]]
#> [1] "that"
#> 
#> ...

metadata$subject <- list (keywords = list ("one", "two", "three"), "subjects" = list ("this", "that"))
metadata_dcmi <- validate_dcmi_metadata (metadata)
metadata_service2 <- translate_dc_to_service (metadata_dcmi, service = service)
print (metadata_service2)
#> ...
#> 
#> $metadata$keywords
#> $metadata$keywords[[1]]
#> [1] "one"
#> 
#> $metadata$keywords[[2]]
#> [1] "two"
#> 
#> $metadata$keywords[[3]]
#> [1] "three"
#> 
#> 
#> $metadata$subjects
#> $metadata$subjects[[1]]
#> [1] "this"
#> 
#> $metadata$subjects[[2]]
#> [1] "that"
#> 
#> ...

metadata_service2$created <- metadata_service1$created <- NULL
identical (metadata_service1, metadata_service2)
#> [1] TRUE

Created on 2023-04-18 with reprex v2.0.2

Just need to ensure this is tested, and documented via #26

mpadge commented 1 year ago

Note that the above code no longer works as shown there. The subject field in the Zenodo schema was not properly specified. It actually follows a defined structure which has now been implemented. Specifying subject to pass to Zenodo would now require this:

metadata <- list (
    title = "New Title",
    abstract = "This is the abstract",
    creator = list (list (name = "A. Person"), list (name = "B. Person")),
    description = paste0 (
        "## description\nThis is the description\n\n",
        "## version\n1.0"
    ),
    subject = list (
        keywords = list ("one", "two", "three"),
        subjects = list (list (term = "this", identifier = "https://this"),
                         list (term = "that", identifier = "https://that"))
    )
)

Subject specification for Zenodo requires "term" and "identifier" for each item (with an additional "scheme" term auto-filled by Zenodo). There is no simple way of expressing that kind of key-value array of subjects in simple markdown/text form, so no text equivalent is really possible. Complex structures like that simply have to be specified as list items.