jrochkind commented 2 years ago

As a specific use-case for existing #201, we now want to focus on the Max Planck oral history project, create an API that they can use to get our oral history metadata.

Treat them as a use case/user in making requirements and spec'ing out what we have (although it would ideally be available for all items not just OH, for #201).

jrochkind commented 1 year ago

Draft public doc intro for API availability. Not sure where it will go. Re-written from FAQ "Can I freely download or extract your metadata?". Being edited here for now.

API access

We strive to make our open data freely available, but the options we provide for machine-readable metadata access currently consist of somewhat limited and disparate services. If you have a project that could benefit from more convenient or standardized machine-accessible APIs for metadata access, please get in touch to share your use case.

OAI-PMH

We have an OAI-PMH feed which can give access to our metadata in an XML format. The fields are based on the OAI-DC schema, with extensions suggested by the DPLA metadata application profile, as this feed's main use case is DPLA use.

This metadata includes standardized basic descriptive attributes, but does not include all internal, administrative, and relational metadata.

You can bulk harvest via an OAI-PMH 2.0 endpoint at https://digital.sciencehistory.org/oai

You can also get an oai-dc XML representation for any record by adding .xml to the end of a record's URL. For instance, https://digital.sciencehistory.org/works/vt150j62m.xml.

Atom feeds

Any search result is available in the Atom Syndication Format. Just add .atom to the path of any search results, for instance: https://digital.sciencehistory.org/catalog.atom?q=chemistry instead of: https://digital.sciencehistory.org/catalog?q=chemistry

These Atom results are paginated. Note the pagination links at top
Individual entries include title; thumbnail; brief description; and a link to HTML page
For Works, there are also entries to metadata in OAI-DC XML and local json formats. (We do not currently have further machine-readable metadata available for Collection records, which may also show up in search results).

You can also access atom search results within any collection. This includes listing all items in a collection. For instance, for the Oral History Collection: https://digital.sciencehistory.org/collections/gt54kn818.atom

Or with a query:

https://digital.sciencehistory.org/collections/gt54kn818.atom?q=biomedicine

Individual Work metadata

For every "work", you can access metadata in an XML/OAI-DC format, or a local internal JSON format.

The OAI-DC format is a standardized vocabulary (based on DPLA metadata application profile), and should hopefully be fairly stable. However, it includes only a subset of our metadata. E.g.: https://digital.sciencehistory.org/works/46k32ki.xml

The JSON format is a closer representation of our internal metadata, and includes a larger subset of all metadata. However, while we will endeavor to keep it stable, it is more likely to change as a result of internal software changes. E.g.: https://digital.sciencehistory.org/works/46k32ki.json

At present we do not have an API response that will give access to individual files (for instance page images or audio files).

jrochkind commented 1 year ago

@eddierubeiz when you have a chance, could you give my API docs draft above a look and feedback?

eddierubeiz commented 1 year ago

A couple notes:

Under Atom Feeds: "Just substitute catalog.atom for catalog in any search result URL: for instance, https://digital.sciencehistory.org/catalog.atom?q=chemistry instead of https://digital.sciencehistory.org/catalog?q=chemistry.
No comma after "include".
Under "Individual Work metadata", I would use "files" instead of "assets". All assets are files at the end of the day, and "assets" feels jargony to me.

sciencehistory / scihist_digicoll

Metadata Export API for Max Planck use case #1758

API access

OAI-PMH

Atom feeds

Individual Work metadata