Find a way to show annotations from website X on a cloned website X'

ajpeddakotla commented 6 years ago

Revised description with additional context:

PressBooks is an open-source book publishing platform. @SteelWagstaff has been using it together with Hypothesis to create digital books where texts are augmented with media and interactive activities inside annotations.

One feature that PressBooks provides is the ability to fork a book and customize it for various purposes. Since the fork lives at a different URL than the original, annotations on the original book no longer appear on the fork.

Steel is looking for a way for instructors using forked books to re-use the annotations from the original books.

See the conversation on PressBooks here for more information: https://discourse.pressbooks.org/t/hypothesis-annotations-and-cloned-books/304

Original description:

We’d like to be able to do exactly what is currently possible for pages with isPartOf metadata for pages/objects with isBasedOn metadata declarations.

ajpeddakotla commented 6 years ago

Tagging Rob and Jon as a fyi

SteelWagstaff commented 6 years ago

A similar issue is cross referenced in https://github.com/hypothesis/product-backlog/issues/677.

SteelWagstaff commented 6 years ago

Based on conversations with @robertknight, it sounds like we may want to explore a method by which annotations are copied/cloned when books are cloned. One advantage of this would be that the original annotations could be attributed to a new or different author, meaning that they could then be edited by the person who made the clone, allowing the annotations to be modified, deleted, or edited, just as the text to which they are anchored can be. @greatislander—would this be something that could be included in the Pressbooks cloning routine?

greatislander commented 6 years ago

@SteelWagstaff This doesn't sound like it would be a trivial change because at present, Pressbooks isn't "aware" of the annotation layer on top of a book. My initial thinking suggests that to achieve this, a clone operation — in addition to cloning Pressbooks content — would need to fetch all the annotations for each front matter, chapter, back matter etc. from the Hypothesis API and then create them for the cloned content by making additional requests to the Hypothesis API.

judell commented 6 years ago

Let's try a cloning operation and see what we learn. As I told @SteelWagstaff yesterday, it's easy to clone top-level annotations, harder to reconstruct threads, but the former is a reasonable starting point. Question for you, @greatislander. Until https://github.com/hypothesis/product-backlog/issues/30 is done we can't use the API to find all annotations for a book. But we can do it URL by URL. I'm guessing the set of URLs that would need to be queried to clone annotations for a whole book is available in a manifest? Given that, I can provide an initial cloning tool. Lacking that, there's still this strategy -- https://github.com/hypothesis/product-backlog/issues/235#issuecomment-357459601 -- which might suffice for an initial test.

robertknight commented 6 years ago

Until hypothesis/product-backlog#30 is done we can't use the API to find all annotations for a book. But we can do it URL by URL.

You can do some URL based filtering in the API using the uri.parts field. This field matches a keyword that appears anywhere in the URL. A keyword in this context is anything other than the characters #+/:=?.-. For example: 'https://hypothes.is/api/search?uri.parts=bbc&uri.parts=com' will match annotations made on any page where the URL contains the keywords "bbc" and "com". If you wanted to limit the results to only allow these keywords in the domain or path you would need to additional filtering of the results on the client.

This is a bit awkward compared to if we supported site=bbc.com in the query, but it will have to do until https://github.com/hypothesis/product-backlog/issues/30 is addressed.

judell commented 6 years ago

Thanks @robertknight. I knew about uri.parts but hadn't thought of combining several parts as you show here. Clever!

robertknight commented 6 years ago

I have revised the issue description based on my understanding after talking with @SteelWagstaff at the hackday. Please let me know if I've made any errors.

Based on that description, there are a few possibilities I can think of:

Write a tool to clone the annotations from the original book and associate the new annotations with a different user, URL and group as I think @judell is suggestion. This could be done outside of Hypothesis as an OAuth client using the API to read and write annotations.
Provide a way for the client to show the annotations from the original on the copy but without associating any new annotations made on the copy with the original. I think this is what the isBasedOn suggestion (which I understand to mean a <meta name="dc.relation.isBasedOn" content="{original URL}"> tag in the page similar to our dc.relation.isPartOf support) was getting at. Note that the requirement here is not the same as isPartOf because that identifier is also used when creating new annotations. In this context we only want existing annotations made on the original book to show up. New annotations would presumably only show up on the forked book (?)

Approach (2) has a number of issues in this context which may be seen as pros or cons and potentially surprising:

The annotations will only be editable by the original author in the forked book, not the author who created the fork. That will prevent eg. translation of annotations.
The annotations will live in their original group, which may be closed to contributions from people other than the original author. I gather that Steel has used the Public channel in his current books so this isn't a problem, but it could be for other users.
Any discussion associated with the original annotations will show up in the forked book
Any new annotations made after the fork will also show up in the forked book

I'm inclined to suggest that approach (1) would be better. This can be implemented outside of the Hypothesis core. If in the process this turns up missing API capabilities then we could look at how to plug those capabilities.

Does this sound reasonable? If so I'm inclined to close this specific issue for the time being and reconsider only if cloning turns out not to be viable for some reason.

judell commented 6 years ago

Regarding approach (1), cloning, I have a tool almost ready to do that. (It's something we get asked for a lot.) I'll be on a short vacation later this week and early next, but after that, I'm hoping @SteelWagstaff can give it a try. It will, as per above, require a list of URLs for a book, one per chapter (or section).

greatislander commented 6 years ago

@judell If one visits any Pressbooks book, one can append /wp-json/pressbooks/v2/toc and get a JSON blob of all the book’s contents (which includes URLs). E.g. https://press.rebus.community/makingopentextbookswithstudents/wp-json/pressbooks/v2/toc

judell commented 6 years ago

Thanks @greatislander! I wonder, @SteelWagstaff and others, if there's a nice example of an upstream Pressbooks book that has annotations that could be cloned (experimentally) to a downstream variant of the book.

SteelWagstaff commented 6 years ago

Sure--we've been doing a lot of learning activity development on this title: https://wisc.pb.unizin.org/frenchcscr/. Happy to do some testing on/with it and to help however I'm able. That book should be of particular interest for testing because it has several chapters, each chapter tends to have multiple annotations, and to my knowledge, all existing annotations on the public layer are top-level annotations (I don't think we have a lot of replies/child annotations there). We're actually in conversations right now with @heatherstaines about setting up a restricted group for the entire domain, and the idea would be to use it with this book so that we'd have a publicly visible layer that only included the annotations/learning activities that the instructor(s) wanted learners to see. Perhaps we could test the cloning routine/tool that you're building Jon in conjunction with that use case even?

judell commented 6 years ago

Perhaps we could test the cloning routine/tool that you're building Jon in conjunction with that use case even?

Let's aim for that. As we discussed last week, the plan is only to clone top-level annotations for now. I'm thinking they'll show up downstream with an appended note that indicates where cloned from, when, and who was the original author there.

Hmm. Now that I think about it, it might make sense to at least show, and maybe link to, replies in the original location. But, first things first.

SteelWagstaff commented 6 years ago

Jon--when you're back from vacation we're willing/ready to do some testing. We have a number of annotations on a particular title: https://wisc.pb.unizin.org/frenchcscr/ made by a 'corporate user' (UW_Madison.French: https://hypothes.is/users/UW_Madison.French) that we'd like to move from the public layer to a new restricted group that Heather and Arti set up for us. Is there a way to select and migrate these annotations in bulk either on your end or on ours, or possibly via the API? The key thing is that we'd want to target only those annotations made on the public layer on that particular book (urls that begin with the string https://wisc.pb.unizin.org/frenchcscr/) by the user UW_Madison.French.

klemay commented 3 years ago

Moving to product backlog

hypothesis / product-backlog

Find a way to show annotations from website X on a cloned website X' #1166