Readable Unambiguous Referencing

GregSchofield commented 7 years ago

The context is making EPUBs of historical fiction and non-fiction.

The problem is to combine traditional citing, by page and section, with structural references over multiple editions. Some of this is just a matter of setting up independent standards, but also having a reader that allows for this sort of use.

The secondary problem is not to overburden the original text with needless markup.

Imagine 15 different editions of the same book with different pagination and some changes in text.

Finding a specific reference, by choosing the edition and then the reference, should be external to the text source, but within the EPUB.

This means that the structural identity within the mark-up has to mediated visually and in substance. So that it can be found by the reader and copied from the reader retaining citing and containing the data fragment.

My conclusion is this can be elegantly achieved by using identifiable CSS fragments. For instance an ‘edition’ fragment that simply suppresses (hides) some source material while revealing (visible) others. For instance the differences in editions of Darwin’s Origin of Species etc.,.

A pagination/section sheet which provides labels, or other indicators within elements.

An ACTIVE reader to use this it has to assemble CSS Fragments with the EPUB data into a JSON file that is copied. The reader would need a plugin support, for this should be indolently reviewed and also integrated into things like Zotero.

A PASSIVE EPUB with a default view and a selection system so that the user gets the reference visuals that suit their immediate need.

Perhaps there is an entirely different way of handling compounded publications and unambiguous referencing.

GregSchofield commented 7 years ago

"independently" not "indolently" reviewed

llemeurfr commented 7 years ago

Hi Greg, your issue is interesting, but your exposition still abstract for me at least (a non native English speaker). Let's see if I understood something:

You text/proposal seems to introduce two notions: 1/ annotations: citing is about selecting and annotating a piece of (html) content.

Annotations should be external to the html content, seems that we agree on that. W3C Web annotations seem to be a good tool for that, even if not currently supported by the hypothes.is annotation toolkit currently added to Readium. Annotations will be associated to content via "locators" (discussion here). Annotations may currently be embedded in EPUB 3 files (spec here, but with no known implementation).

2/ editions and revisions: text can be revised (e.g. typo corrections and other minor edits) or edited more consistently. Electronic documents, like software, may therefore get major.minor versions (or an identifier + dates of publication).

Each version will usually be distributed in a properly identified EPUB file. Seems you're advocating the inclusion of different revisions in the same publication, with CSS tricks (CSS fragments in your proposal; are your refering to the CSS fragmentation module?) to enable a comparison of different revisions. Nothing forbids this in EPUB, but I would advocate that such feature is so specific that the corresponding UI should be in the publication itself, not in the reading system.

GregSchofield commented 7 years ago

Hi llemeurfr, it is my exposition that is the problem.

With EPUBs and html in general I am at best a dabbler whoi can only manage the most basic scripting.

First my problem is not with EPUB editions, and the solution you gave for this is perfect for that.

I will have read about Web Annotations carefully to see just what they are doing so I cannot give an opinion.

I will need to read through CSS fragmentation module I am not sure, I have just used fragmented CSS as modules and thought it would be a simple extension of selecting different style-sheets within an EPUB.

I thought that selecting CSS assembly modules would be a simple way, though something more sophisticated might be used. The simple way be “first edition with pages.css”, “first edition with para.numbers.css” or “first edition with para.numbers & pages.css” each designed for the specific content of the EPUB. I supposed this was how things would be used to choose between fixed and flowing versions.

What I was considering was using an \<p>'s ID:XXX and using CSS to place a margin reference before it such as (chapter).(paragraph) 2.10 which can be searched, but selecting the text as HTML using a script looks up the CSS reference in use and the documents data gives a fragment in JSON it might look like this:

[{ "id": "BQFKDG7B", "href": "http://public domain/BQFKDG7B/On the Origin of Species.epub", "type": "book", "title": "On the Origin of Species: By Means of Natural Selection; or the Preservation of Favoured Races in the Struggle for Life", "publisher": "John Murray, Albemarle Street, W.", "publisher-place": "London, U.K.", "author": [ { "family": "Darwin", "given": "Charles" }], "issued": {1.00 "date-parts": [ ["1859"], ], "test": {Certainly no clear line of demarcation has as yet been drawn between species and sub-species — that is,the forms which in the opinion of some naturalists come very near to, but do not quite arrive at the rank of species; or, again, between sub-species and well-marked varieties, or between lesser varieties and individual differences. These differences blend into each other in an insensible series ; and a series impresses the mind with the idea of an actual passage.}} ]

Json is overloaded with information, but then it is easily filtered say into \<a href="http://public domain/BQFKDG7B/On the Origin of Species.epub" arg="1.00, chap.2.10">The Origin of Species by Charles Darwin (1859). Edition 1.00. Chapter 2 paragraph 10.\<\/a>

However, the reader’s API would have to be able to provide the instance data to assemble the JSON quote. This is really why I wrote to your project, it see if the foundations for scholarship could be laid early. I would also point out the importance of TEI, and cannot understand why this has not been included in EPUB from the start (XML plus CSS, is used in very serious manuscript analysis and other scholastic studies.)

Aside from referencing there are other things worth considering:

This intelligent referencing can including making an EPUB that references the Collected Works of Charles Darwin where all the parts are outside of it as library of separate epubs. I am currently referring to a 50 volume pdf collected works which involves a number of look ups, that could be eliminated by this.

I am also thinking of this in terms of emails and other documents that could be unambiguously referenced into collections or collated by reference to them as items or as parts. Plus authoring tools that might allow such records to be gathered into a single EPUB.

Of course this is all very primitive I have been waiting on EPUBs to break out of the box and 3.1 looks like EPUB's could become a much more general way of keeping all sorts of things, but my concern is to bring scholastic tools into the hands of enthusiasts rather than just academics.

I see the problem as taking the printed/scanned works of the past and making them into accurate EPUBs that outshine paper in every respect. However, we do not need bloated readers hence I have tried to frame this as keeping the reader light, allowing for plugins and using what is already in use rather than a new or untried technology. I also figure adding in scripts for special purposes into the EPUB does not overburden authors with trying to fit a particular readers features.

Thank you for your help and patience, I hope this makes things a little clearer.

readium / architecture

Readable Unambiguous Referencing #50