nulib / arch

Northwestern University institutional repository, built on Samvera's Hyrax gem.
2 stars 0 forks source link

Add JSON-LD to public Dataset Work view with Schema.org syntax #771

Open chrisdaaz opened 3 years ago

chrisdaaz commented 3 years ago

Descriptive summary

Dataset discoverability could be a strong selling point to researchers evaluating Arch as a data repository. Google indexes datasets on the web using Schema.org metadata embedded within the <head> element of a page containing dataset information. We can pull metadata from the Dataset work type to display JSON-LD for Google web crawlers for all deposited datasets.

Metadata Mappings

Work

Arch Metadata Schema.org Metadata
Description description
Title name
Creator creator
DOI identifier
License license
Keywords keywords
DOI url
[true] isAccessibleForFree

Relevant documentation: https://developers.google.com/search/docs/data-types/dataset#structured-data-type-definitions

Files

Distribution is the metadata we need to describe how to download the files. Here's an example:

"distribution":[
         {
            "@type":"DataDownload",
            "encodingFormat":"CSV",
            "contentUrl":"http://www.ncdc.noaa.gov/stormevents/ftp.jsp"
         },
         {
            "@type":"DataDownload",
            "encodingFormat":"XML",
            "contentUrl":"http://gis.ncdc.noaa.gov/all-records/catalog/search/resource/details.page?id=gov.noaa.ncdc:C00510"
         }
      ],

It would be great to produce distribution records for each FileSet within the Dataset, starting with a contentUrl that points to the ZIP folder for the "Download All Files" if one is available.

Relevant documentation: https://developers.google.com/search/docs/data-types/dataset#download

Expected behavior

Here's an example from Arch - staging: https://arch.stack.rdc-staging.library.northwestern.edu/concern/datasets/1j92g770w?locale=en

  <head>
    <script type="application/ld+json">
    {
      "@context":"https://schema.org/",
      "@type":"Dataset",
      "name":"Data Deposit (with many file formats)",
      "description":"Data deposit: images, docs, sound, videos.",
      "url":"https://arch.stack.rdc-staging.library.northwestern.edu/concern/datasets/1j92g770w",
      "identifier": "https://doi.org/10.21985/n2-bpek-5555",
      "license" : "http://opendatacommons.org/licenses/by/1.0/",
      "isAccessibleForFree" : true,
      "creator":{
         "@type":"Person",
         "name":"chris",
      },
      "includedInDataCatalog":{
         "@type":"DataCatalog",
         "name":"Arch",
         "provider":"Northwestern University Libraires",
         "url": "https://arch.library.northwestern.edu/",
      },
      "distribution":[
         {
            "@type":"DataDownload",
            "encodingFormat":"ZIP",
            "contentUrl":"https://arch.stack.rdc-staging.library.northwestern.edu/concern/datasets/1j92g770w?locale=en"
         },
      ],
    }
    </script>
  </head>

Related work

https://github.com/nulib/arch/issues/744

chrisdaaz commented 2 years ago

similar work done here for possible reference

with a working example looking like:

<!-- Add Schema for Google Search for work view; the if check is here to make sure you are in work page -->
 <script type="application/ld+json">
 {
 "@context": "http://schema.org/",
 "@type": "Dataset",
 "@id": "https://doi.org/10.7302/1s0g-b468",
 "name": "Simulated historical (1995-2014) and future (2081-2100) pollen emission using PECM2.0 Raw data",
 "description": ["Atmospheric aerosols are emitted from both natural and anthropogenic sources, and they play an important role in climate, impacting solar radiation and cloud formation. Compared to other types of aerosol particles, primary biological aerosol particles (PBAP, e.g., fungal spores, bacteria, pollen, virus, etc.) are relatively understudied. However, they are linked to adverse health effects and have the potential to influence ice nucleation at higher temperatures. Anemophilous (or wind-driven) pollen is one of the important PBAP, impacts cloud properties under some conditions, and triggers allergic diseases such as allergic rhinitis (also known as hay fever) and asthma. Because pollen emission is closely associated with environmental drivers, the climatic change could influence pollen emission and consequently the incidence of allergic disease. Using CMIP6 model data, our research projects continental-scale changes in pollen emissions at the end of the century, considering the effects of temperature, precipitation, CO2, and future vegetation distribution change. While prior studies have evaluated single types of pollen, we use a mechanistic model to comprehensively simulate total pollen across the United States from all sources. Similar to previous single-source pollen studies, our simulations suggest that pollen season duration will lengthen, and pollen emission will increase in the future, but in addition, we identify new synergies between different pollen types that can influence the maximum daily pollen. Our work highlights that the changes of overlap between pollen seasons of different vegetation taxa can magnify or mitigate the impacts of climate change, which addresses the importance to study all pollen emissions comprehensively. Given pollen is one of the most common triggers of seasonal allergies, our findings also provide information to evaluate global health conditions in the future. In this study, all of the pollen emission data are written in NetCDF files. "],
 "url": "https://deepblue.lib.umich.edu/data/concern/data_sets/0c483j691",
 "identifier": "https://doi.org/10.7302/1s0g-b468",
 "isAccessibleForFree": true,
 "keywords": ["Pollen emission change","Climate change","Public health","Vegetation land cover change","CO2 effects"],
 "creator": [{ "@type": "Person",
 "name": "Zhang, Yingxiao MI"},{ "@type": "Person",
 "name": "Steiner, Allison MI"}],
 "citation": "Zhang, Y., Steiner, A. (2022). Simulated historical (1995-2014) and future (2081-2100) pollen emission using PECM2.0 Raw data [Data set], University of Michigan - Deep Blue Data. https://doi.org/10.7302/1s0g-b468",
 "license":
 {"@type": "CreativeWork",
 "name": "Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)",
 "license": "http://creativecommons.org/licenses/by-nc/4.0/"},
 "publisher":
 {"@id": "https://deepblue.lib.umich.edu/data/",
 "@type": "Organization",
 "legalName": "University of Michigan - Deep Blue Data",
 "name": "Deep Blue Data",
 "url": "https://deepblue.lib.umich.edu/data"}
 }
 </script>