hypothesis / product-backlog

Where new feature ideas and current bugs for the Hypothesis product live
118 stars 7 forks source link

can we use doi: as a primary lookup for scholarly docs? #144

Open judell opened 7 years ago

judell commented 7 years ago

In the same way that we now use a PDF fingerprint as the primary lookup for annotated PDFs, might we use a DOI for scholarly articles that are quite often syndicated to many sites and typically include a DOI in HighWire or Dublin Core metadata?

Some background here: http://jonudell.net/h/syncing-syndicated-scholarly-annotations.pdf

dwhly commented 7 years ago

I'm just dropping this in so it's indexed somewhere--its not immediately relevant to the gist of this issue.

There is a library (java for now) that Crossref uses, written by a group at the Univ of Warsaw, called CERMNE which searches for DOIs in article PDFs. Apparently its fairly good at picking out the DOI that identifies the article vs other DOIs that may be mentioned in the text or footnotes.

judell commented 7 years ago

User story: feedback on OER books

OpenStax publishes a book, it is syndicated to other OER publishers, all would like annotations on any of their syndicated copies to coalesce, but each will want to use their own rel="canonical". All are willing to include meta name="citation_doi" content="the_doi".

Issues

This would seem to require them to create DOIs corresponding to the granularity of their published pages (e.g. per-chapter, maybe even per-section?) That may not be feasible, I've reached out to Kathi Fletcher at OpenStax and Hugh McGuire at Pressbooks, for their opinions. If the overhead makes this a non-starter for them, the DOI-as-primary-lookup concept still makes a ton of sense for scholarly articles where the relationship of DOI to web page is 1:1. But how might we meet the need for OER publishers?

judell commented 7 years ago

Here's a proof-of-concept for DOI-first lookup: http://h.jonudell.info:82/doku.php?id=eic:doi-preferred-lookup

This actually solves a different problem that's been plaguing our NIF/SciBot colleagues and will affect anyone in the scholarly ecosystem who expects annotations to coalesce across syndicated copies of journal articles.

For the OER situation, I've discussed with Hugh and Kathy, they definitely won't want to mint per-chapter or -section DOIs. But they could mint another flavor of identifier, say, dc:identifier, and we could use it in a similar way.

Hmm. It could make sense for a syndicator to use the authoritative site's rel-canonical as its dc:identifier. thus not polluting the syndicator's own rel-canonical.

judell commented 7 years ago

Another aspect to consider in the OER case: They may serve PDFs as well as HTML pages, in that case the PDF is unlikely to be chunked in the same way the web pages are. But I guess we're already in the situation where annotated chapters/sections do not bear a 1:1 relationship to whole books.

malcook commented 5 years ago

USER STORY: feedback to author on community discussion of authored articles

Dr. X publishes 10 papers a year and would like one-stop-shopping for "who is annotating / discussing my research". Their university already manages a web-site which lists all their publications by doi:. Dr. X asks their IT team to draw on this list of doi:s and create a weekly dump of any such activity. This issue makes it possible via API.