ropenscilabs / deposits

R Client for access to multiple data repository services
https://docs.ropensci.org/deposits/
Other
37 stars 3 forks source link

keywords #36

Closed mpadge closed 1 year ago

mpadge commented 1 year ago

Necessary because they are really useful in searching and filtering like here. DCMI facilitates them in the "subject" element. These could then default to c ("frictionlessdata", "deposits"), with additional bits specified by user.


This issue is also going to serve as a place to document the processes by which JSON schemas can be adapted and specialised to the particular requirements of deposits services.

mpadge commented 1 year ago

Steps:

  1. Identify appropriate DCMI entities to hold elements
  2. Specify translations in service-specific terms in inst/extdata/<service>/from_dc.json, like in this commit which specifies that the DCMI subject is the element that should hold "keywords", and that those should be translated into a "keywords" item for Zenodo.
  3. Ensure that the target schema expresses the appropriate JSON types, like in https://github.com/ropenscilabs/deposits/blob/9063eda4251f819c96628f42988a5f5bc7774c96/inst/extdata/zenodo/schema.json#L101-L106
mpadge commented 1 year ago

The tests now include lots of examples of use of keywords. These always have to be defined in "subjects", not "description". The following code illustrates the new functionality, starting with what happens when "keywords" are defined in the wrong field:

library (deposits)
packageVersion ("deposits")
#> [1] '0.1.0.53'
metadata <- list (
    title = "New Title",
    abstract = "This is the abstract",
    creator = list (list (name = "A. Person"), list (name = "B. Person")),
    description = paste0 (
        "This is the description\n\n",
        "## keywords\none, two\nthree\n\n## version\n1.0"
    )
)
cli <- depositsClient$new (service = "zenodo", metadata = metadata, sandbox = TRUE)
#> Error: Metadata source for [keywords] should be [subject] and not [description]
cli <- depositsClient$new (service = "figshare", metadata = metadata)
#> Error: Metadata source for [keywords] should be [subject] and not [description]

The error message for both services is sufficiently informative to know what to do next:

metadata$description <- "This is the description\n\n## version\n1.0"
metadata$subject <- "## keywords\none, two\nthree"
cli <- depositsClient$new (service = "zenodo", metadata = metadata, sandbox = TRUE)
cli$deposit_new ()
#> ID of new deposit : 1177062
cli$hostdata$metadata$keywords
#> [[1]]
#> [1] "one"
#> 
#> [[2]]
#> [1] "two"
#> 
#> [[3]]
#> [1] "three"

cli <- depositsClient$new (service = "figshare", metadata = metadata)
cli$deposit_new ()
#> Files for private Figshare deposits can only be downloaded manually; no metadata can be retrieved for this deposit.
#> ID of new deposit : 22348531
cli$hostdata$tags
#> [1] "one"   "two"   "three"

Created on 2023-03-28 with reprex v2.0.2

And those keywords get appropriately uploaded to the individual fields for each service, as determined for Zenodo here: https://github.com/ropenscilabs/deposits/blob/32dac5830d9d02a14829f2d93898d40d8064b64f/inst/extdata/zenodo/from_dc.json#L191-L193 with the array-style formatting given in the corresponding schema: https://github.com/ropenscilabs/deposits/blob/32dac5830d9d02a14829f2d93898d40d8064b64f/inst/extdata/zenodo/schema.json#L101-L106

Corresponding translation entry for Figshare is here: https://github.com/ropenscilabs/deposits/blob/32dac5830d9d02a14829f2d93898d40d8064b64f/inst/extdata/figshare/from_dc.json#L162-L164 And expected form in schema is here: https://github.com/ropenscilabs/deposits/blob/32dac5830d9d02a14829f2d93898d40d8064b64f/inst/extdata/figshare/schema.json#L26-L31

And that's it.

mpadge commented 1 year ago

Note for future reference

This issue has been closed, but should be re-opened at some stage to extend and refine the interpretation of expected formats according to translation schemas. The final examples above from both Figshare and Zenodo "schema.json" files show the formats:

"keywords": {
    "type": "array",
    "items": {
        "type": "string"
    }
}

Inputs generally have to be compound strings like those illustrated above:

metadata$subject <- "## keywords\none, two\nthree"

And those strings then have to be accordingly converted. Current routine is here: https://github.com/ropenscilabs/deposits/blob/6fdbf5a461ca63f6bdd16aca56c3d80313726bc4/R/metadata-translate.R#L274-L311 That shows that it only converts arrays. Other conversions will need to be added when use cases arise. It's difficult to do without those, because the conversion routines need to be tested against extenal APIs to ensure they work as expected. Likely conversions will include:

TODO

mpadge commented 1 year ago

That commit above should have referred to #62, not #36