hypothesis / product-backlog

Where new feature ideas and current bugs for the Hypothesis product live
118 stars 7 forks source link

add application/rdf+xml and application/opml+xml to list of ignored types for rel="alternate" #873

Open judell opened 5 years ago

judell commented 5 years ago

Annotations were aliased across a bunch of subdomains of hypotheses.org:

image

Here is part of the problem. When the client acquires metadata at *.hypotheses.org it includes:

{href: "https://vertigo.hypotheses.org/feed/rdf", rel: "alternate", type: "application/rdf+xml"}

We intend to ignore rel="alternate" declarations that point to feeds. However we only exclude the patterns application/rss+xml and application/atom+xml:

https://github.com/hypothesis/client/blob/734e3a25318364819a8c38ef881e4788a2b06365/src/annotator/plugin/document.js#L180

We should also exclude application/rdf+xml.

But that only explains why, e.g., everything at vertigo.hypotheses.org collapsed into https://vertigo.hypotheses.org/feed/rd`.

How did other domains like sms.hypotheses.org also collapse into the same bucket? The culprit is here:

{href: "https://www.openedition.org/opml.php?pubtype=carnet", rel: "alternate", type: "application/opml+xml"}

So we should also exclude application/opml+xml. That declaration is common to all the *.hypotheses.org domains.

judell commented 5 years ago

I think I see a fairly low-effort solution.

  1. Change (rss|atom) to (rss|atom|rdf|opml) here: https://github.com/hypothesis/client/blob/734e3a25318364819a8c38ef881e4788a2b06365/src/annotator/plugin/document.js#L180, to prevent this happening again.

  2. Delete all records in document_uri matching https://metabase.hypothes.is/question/436, to remove bogus hirmeos-related equivalences. (Or perhaps better, records matching https://metabase.hypothes.is/question/437 which catches slightly more -- 729 vs 627, the difference being non-hirmeos-related bogus equivalences.)