Create a way that annotated documents can be embedded on web pages.

DocumentCloud is a popular open source project and service that allows reference documents to be uploaded and then annotated and embedded for viewing inside web pages.

Here is an example: http://www.techdirt.com/articles/20131031/12394625090/feinstein-releases-fake-nsa-reform-bill-actually-tries-to-legalize-illegal-nsa-bulk-data-collection.shtml

There are some issues with this model. 1- The original source URL is no longer available as the canonical reference when it's uploaded. 2- DocumentCloud documents can't be annotated by us at present, and they aren't focused on an OA annotation model right now, though we have had discussions with them about possibly helping them integrate in the future.

It might be better if the original reference document-- perhaps the PDF of the bill-- could be embedded (using PDFjs as the rendering engine?) inside a frame in the web page, just like Document Cloud does. So that instead of uploading it to a secondary service, it's streamed into the frame from its original source.

This way, annotations could be laid on top of the embedded document by reference to its proper URL-- the same annotations that were made on top of the document where it lives natively.

Yes, we definitely have to implement embedding (pieces of) documents into other documents, together with the relevant annotations.

I would like to divide this task to two sub-tasks:

Task1 - for existing documents with embedded content: specify the relation

We need to be able to specify (either manually or automatically) if a given part is actually coming from a different document, and when we have this information, we should load the annotations from the original document.

Similar issues in our tracker:

6 Ability to link quotes back to their sources ("[...] registers the connection. After this point, _any annotation happening on the source text (inside the quoted section) should appear on all the quotes that are linked back to the source, and vica versa_. [...] The user can proceed to submit reply to the highlight (either at the source document, or at the particular news article; it does not matter any more). Or he can annotate something inside the quote. [...]")
https://github.com/hypothesis/h/issues/766 support entity-context annotations (see discussion to see how this is relevant)

Task2 - for publishing new documents: Provide a way to easily embed content from other documents

This feature is not intended for "mere" readers/annotators; the intended users are bloggers/journalists/other publishers.

Similar issues in our tracker:

https://github.com/hypothesis/h/issues/141 Link: Ability to expose HTML / Widget code for embedding individual annotations

Compared the issue above, what you are proposing above is embedding not only individual annotations/highlights, but huge pieces of the original document, or maybe the whole document.

I agree that this should be done by using an iframe (just like embedded YouTube videos).

I am not sure which approach is better:

A) pointing the frame to the original document, or
B) pointing it to our own server, which can then serve the required segment of the wanted document

Approach A) might be easier to implement, because it does not require any server-side services for doing this. However, we might encounter unsolvable problems because of cross-domain restrictions. (Some pages don't like to be framed inside other pages.) Also, this approach does not allow embedding segments of documents; it's all or nothing.

Approach B) is a but more difficult, because for this to work, we have to either store the required document on our server, or provide some kind of proxy service. (For acquiring the wanted content, and passing it the the requesting page.) Furthermore, we need to decide the format to use:

We can simply serve only the serialized string content (easy, but ugly), or
we can try to preserve the HTML structure (nicer, but then we also have to care for CSS... this can get really complicated.)

But with approach B), we would not have any cross-domain problems, and we can serve arbitrary ranges of documents, upon request.

hypothesis / vision

Create a way that annotated documents can be embedded on web pages. #12

Task1 - for existing documents with embedded content: specify the relation

Task2 - for publishing new documents: Provide a way to easily embed content from other documents