Open judell opened 6 years ago
I think I see a fairly low-effort solution.
Change (rss|atom)
to (rss|atom|rdf|opml)
here: https://github.com/hypothesis/client/blob/734e3a25318364819a8c38ef881e4788a2b06365/src/annotator/plugin/document.js#L180, to prevent this happening again.
Delete all records in document_uri
matching https://metabase.hypothes.is/question/436, to remove bogus hirmeos-related equivalences. (Or perhaps better, records matching https://metabase.hypothes.is/question/437 which catches slightly more -- 729 vs 627, the difference being non-hirmeos-related bogus equivalences.)
Annotations were aliased across a bunch of subdomains of hypotheses.org:
Here is part of the problem. When the client acquires metadata at *.hypotheses.org it includes:
{href: "https://vertigo.hypotheses.org/feed/rdf", rel: "alternate", type: "application/rdf+xml"}
We intend to ignore rel="alternate" declarations that point to feeds. However we only exclude the patterns
application/rss+xml
andapplication/atom+xml
:https://github.com/hypothesis/client/blob/734e3a25318364819a8c38ef881e4788a2b06365/src/annotator/plugin/document.js#L180
We should also exclude
application/rdf+xml
.But that only explains why, e.g., everything at vertigo.hypotheses.org collapsed into https://vertigo.hypotheses.org/feed/rd`.
How did other domains like sms.hypotheses.org also collapse into the same bucket? The culprit is here:
{href: "https://www.openedition.org/opml.php?pubtype=carnet", rel: "alternate", type: "application/opml+xml"}
So we should also exclude
application/opml+xml
. That declaration is common to all the *.hypotheses.org domains.