hypothesis / h

Annotate with anyone, anywhere.
https://hypothes.is/
BSD 2-Clause "Simplified" License
2.95k stars 426 forks source link

no equivalence between http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688937 and http://europepmc.org/articles/PMC4688937 #3595

Closed judell closed 8 years ago

judell commented 8 years ago

The two URLs share these equivalences:

<meta name="dc:identifier" content="info:pmcid/PMC4688937"/> <meta name="citation_doi" content="10.1186/s13036-015-0022-z"/>

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688937 has 3 private and 1 public annotations

The badge says 4 for both URLs.

The client receives annotations for http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688937 as expected: 3 in a group, 1 in public.

(Query: https://hypothes.is/api/search?_separate_replies=true&group=__world__&limit=200&offset=0&order=asc&sort=created&uri=http%3A%2F%2Fwww.ncbi.nlm.nih.gov%2Fpmc%2Farticles%2FPMC4688937%2F)

The client receives nothing for http://europepmc.org/articles/PMC4688937.

(Query: https://hypothes.is/api/search?_separate_replies=true&group=__world__&limit=200&offset=0&order=asc&sort=created&uri=http%3A%2F%2Feuropepmc.org%2Farticles%2FPMC4688937%2F)

chdorner commented 8 years ago

The documents are linked together. When I expand http://europepmc.org/articles/PMC4688937 I do get http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688937/ amongst other URIs.

But the other way around I don't, because http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4688937/ is a rel-canonical URI, which we don't expand.

judell commented 8 years ago

OK.

Some questions:

  1. How are they linked? I see nothing in either referring to the other. Do we know what the intermediary was/is?
  2. In the EuropePMC article I see, e.g., <link rel="alternate" type="application/rss+xml" title="Europe PMC: Content Holdings" href="/contentrss"> That kind of thing has caused problems in the past. How do we handle link rel="alternate" now?
  3. In situations like this, where content is syndicated, how can we best advise publishers to express cross-linkage?
nickstenning commented 8 years ago
  1. Both pages include DOI metadata, and (in general) document equivalence is transitive (i.e. if A is equivalent to B and B is equivalent to C then A is equivalent to C).
  2. We ignore RSS and Atom feeds as sources of equivalent URIs.
  3. Probably the way they're doing here, by agreeing on a universal identifier for the document (in this case, a DOI) and ensuring they include standard metadata (such as DC.Identifier) which references that identifier.
judell commented 8 years ago

2: Good!

3: If they want transitivity, how do they achieve it?

nickstenning commented 8 years ago

If they want transitivity, how do they achieve it?

At the moment, by neither end using a <link rel=canonical>.

But that's probably temporary, even if I can't guarantee how long it's "temporary" for. The reason for not expanding queries for <link rel=canonical>, if you recall, was actually precisely to avoid problems where a site had the same DOI on about 100 pages, and thus we were trying to load 1000s of annotations on a single page, most of which weren't relevant.

Eventually, it should be possible for us to add rules to the database to handle those nasty edge cases, and then perhaps we can turn off the "don't expand a <link rel=canonical>" behaviour.