oxinabox / DataDepsGenerators.jl

Utility for developers to help define DataDeps registration blocks, for reusing existing Data with DataDeps.jl
Other
18 stars 6 forks source link

Support *all* DOIs? (CrossRef and DataCite, at least), via Content Negotiation for RDF? #29

Closed oxinabox closed 6 years ago

oxinabox commented 6 years ago

Right now using #28 we support all DataCite DOI using the DataCite REST API. Well everything except getting the actual download URL. We also support more fully some DOIs using other services (E.g. DataDryad)

From https://github.com/oxinabox/DataDepsGenerators.jl/pull/28#issuecomment-397503502 some more thoughts we what we can get out of content negotiation https://citation.crosscite.org/docs.html

RDF is one of the most common formats for semantic web. it is supported by at least 3 of the DOI providers, including CrossRef and DataCite.

NB: This is a low priority, as for DataCite it gives us nothing new over #28 and will there are 40 million crossref DOIs they are mostly not data.

Contrast the below and notice that while current DataCite gives us more fields, it does not give us any additional useful fields.

They have the same set of useful fields -- which is to say everything we need except the actual URL.

Content-Negotiation: RDF-XML (Cross-Ref, DataCite, mEDRA, prob others)

curl -LH "Accept: application/rdf+xml" https://doi.org/10.6084/m9.figshare.5350216.v1
<?xml version='1.0' encoding='utf-8' ?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:schema='http://schema.org/'>
<schema:Dataset rdf:about='https://doi.org/10.6084/m9.figshare.5350216.v1'>
<schema:author>
<rdf:Description rdf:nodeID='b0'>
<schema:name>Figshare Figshare</schema:name>

</rdf:Description>

</schema:author>

<schema:dateCreated rdf:datatype='http://schema.org/Date'>2017-08-28</schema:dateCreated>

<schema:dateModified rdf:datatype='http://schema.org/Date'>2017-08-28</schema:dateModified>

<schema:datePublished rdf:datatype='http://schema.org/Date'>2017</schema:datePublished>

<schema:description>figshare for Institutions information booklet about managing and disseminating research data to make it more citable, shareable and discoverable.</schema:description>

<schema:identifier rdf:resource='https://doi.org/10.6084/m9.figshare.5350216.v1' />

<schema:isPartOf>
<schema:DataCatalog rdf:nodeID='b1'>
</schema:DataCatalog>

</schema:isPartOf>

<schema:keywords>80707 Organisation of Information and Knowledge Resources</schema:keywords>

<schema:license rdf:resource='https://creativecommons.org/licenses/by/4.0' />

<schema:name>figshare for Institutions - Information booklet</schema:name>

<schema:publisher>
<schema:Organization rdf:nodeID='b2'>
<schema:name>Figshare</schema:name>

</schema:Organization>

</schema:publisher>

<schema:schemaVersion rdf:resource='http://datacite.org/schema/kernel-3' />

</schema:Dataset>

</rdf:RDF>

Current: DataCite API (datacite only)

lyndon@agent:~$ curl -L https://api.datacite.org/works/10.6084/m9.figshare.5350216.v1

{
  "data": {
    "id": "https:\/\/doi.org\/10.6084\/m9.figshare.5350216.v1",
    "type": "works",
    "attributes": {
      "doi": "10.6084\/m9.figshare.5350216.v1",
      "identifier": "https:\/\/doi.org\/10.6084\/m9.figshare.5350216.v1",
      "url": null,
      "author": [
        {
          "literal": "Figshare Figshare"
        }
      ],
      "title": "figshare for Institutions - Information booklet",
      "container-title": "Figshare",
      "description": "figshare for Institutions information booklet about managing and disseminating research data to make it more citable, shareable and discoverable.",
      "resource-type-subtype": "Dataset",
      "data-center-id": "figshare.ars",
      "member-id": "figshare",
      "resource-type-id": "dataset",
      "version": null,
      "license": "https:\/\/creativecommons.org\/licenses\/by\/4.0\/",
      "schema-version": "3",
      "results": [

      ],
      "related-identifiers": [

      ],
      "published": "2017",
      "registered": "2017-08-28T00:45:54Z",
      "checked": null,
      "updated": "2017-09-22T04:19:27Z",
      "media": null,
      "xml": "PHJlc291cmNlIHhtbG5zPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtMyIgeG1sbnM6eHNpPSJodHRwOi8vd3d3LnczLm9yZy8yMDAxL1hNTFNjaGVtYS1pbnN0YW5jZSIgeHNpOnNjaGVtYUxvY2F0aW9uPSJodHRwOi8vZGF0YWNpdGUub3JnL3NjaGVtYS9rZXJuZWwtMyBodHRwOi8vc2NoZW1hLmRhdGFjaXRlLm9yZy9tZXRhL2tlcm5lbC0zL21ldGFkYXRhLnhzZCI+PGlkZW50aWZpZXIgaWRlbnRpZmllclR5cGU9IkRPSSI+MTAuNjA4NC9tOS5maWdzaGFyZS41MzUwMjE2LnYxPC9pZGVudGlmaWVyPjxjcmVhdG9ycz48Y3JlYXRvcj48Y3JlYXRvck5hbWU+Zmlnc2hhcmUgZmlnc2hhcmU8L2NyZWF0b3JOYW1lPjwvY3JlYXRvcj48L2NyZWF0b3JzPjx0aXRsZXM+PHRpdGxlPmZpZ3NoYXJlIGZvciBJbnN0aXR1dGlvbnMgLSBJbmZvcm1hdGlvbiBib29rbGV0PC90aXRsZT48L3RpdGxlcz48ZGVzY3JpcHRpb25zPjxkZXNjcmlwdGlvbiBkZXNjcmlwdGlvblR5cGU9IkFic3RyYWN0Ij5maWdzaGFyZSBmb3IgSW5zdGl0dXRpb25zIGluZm9ybWF0aW9uIGJvb2tsZXQgYWJvdXQgbWFuYWdpbmcgYW5kIGRpc3NlbWluYXRpbmcgcmVzZWFyY2ggZGF0YSB0byBtYWtlIGl0IG1vcmUgY2l0YWJsZSwgc2hhcmVhYmxlIGFuZCBkaXNjb3ZlcmFibGUuPC9kZXNjcmlwdGlvbj48L2Rlc2NyaXB0aW9ucz48c3ViamVjdHM+PHN1YmplY3Qgc2NoZW1lVVJJPSJodHRwOi8vd3d3LmFicy5nb3YuYXUvYXVzc3RhdHMvYWJzQC5uc2YvMC82QkI0MjdBQjk2OTZDMjI1Q0EyNTc0MTgwMDA0NDYzRSIgc3ViamVjdFNjaGVtZT0iRk9SIj44MDcwNyBPcmdhbmlzYXRpb24gb2YgSW5mb3JtYXRpb24gYW5kIEtub3dsZWRnZSBSZXNvdXJjZXM8L3N1YmplY3Q+PC9zdWJqZWN0cz48cHVibGlzaGVyPkZpZ3NoYXJlPC9wdWJsaXNoZXI+PHB1YmxpY2F0aW9uWWVhcj4yMDE3PC9wdWJsaWNhdGlvblllYXI+PGRhdGVzPjxkYXRlIGRhdGVUeXBlPSJDcmVhdGVkIj4yMDE3LTA4LTI4PC9kYXRlPjxkYXRlIGRhdGVUeXBlPSJVcGRhdGVkIj4yMDE3LTA4LTI4PC9kYXRlPjwvZGF0ZXM+PHJlc291cmNlVHlwZSByZXNvdXJjZVR5cGVHZW5lcmFsPSJEYXRhc2V0Ij5EYXRhc2V0PC9yZXNvdXJjZVR5cGU+PHNpemVzPjxzaXplPjEzNjM3MDIgQnl0ZXM8L3NpemU+PC9zaXplcz48cmVsYXRlZElkZW50aWZpZXJzPjxyZWxhdGVkSWRlbnRpZmllciByZWxhdGVkSWRlbnRpZmllclR5cGU9IkRPSSIgcmVsYXRpb25UeXBlPSJJc1ByZXZpb3VzVmVyc2lvbk9mIj4xMC42MDg0L205LmZpZ3NoYXJlLjUzNTAyMTY8L3JlbGF0ZWRJZGVudGlmaWVyPjwvcmVsYXRlZElkZW50aWZpZXJzPjxyaWdodHNMaXN0PjxyaWdodHMgcmlnaHRzVVJJPSJodHRwczovL2NyZWF0aXZlY29tbW9ucy5vcmcvbGljZW5zZXMvYnkvNC4wLyI+Q0MgQlkgNC4wPC9yaWdodHM+PC9yaWdodHNMaXN0PjwvcmVzb3VyY2U+"
    },
    "relationships": {
      "data-center": {
        "data": {
          "id": "figshare.ars",
          "type": "data-centers"
        }
      },
      "member": {
        "data": {
          "id": "figshare",
          "type": "members"
        }
      },
      "resource-type": {
        "data": {
          "id": "dataset",
          "type": "resource-types"
        }
      }
    }
  }
}

oxinabox commented 6 years ago

See #42 for more

mfenner commented 6 years ago

Instead of DOI content negotiation via doi.org you can also go directly to data.datacite.org and this allows you to use the DataCite content negotation service for DOIs from other registration agencies (currently only Crossref). For example (first DOI is from DataCite, second from Crossref):

curl -LH "Accept: application/vnd.schemaorg.ld+json" https://data.datacite.org/10.6084/m9.figshare.5350216.v1
curl -LH "Accept: application/vnd.schemaorg.ld+json" https://data.datacite.org/10.1371/journal.pbio.2001414
mfenner commented 6 years ago

This of course also works for other content types. See https://github.com/datacite/bolognese for what is currently supported.

oxinabox commented 6 years ago

Ah fantastic, I had no idea that I could use the data.datacite.org to work with DOIs issued by anyone other than DataCite.

mfenner commented 6 years ago

Strictly speaking only Crossref at this point, as you need to build in the support for the respective metadata schema.

oxinabox commented 6 years ago

Closed in #43