Open davidar opened 9 years ago
Past attempt in firefox, https://bugzilla.mozilla.org/show_bug.cgi?id=630181
@bramstein
Yes, @bramstein it would be fantastic to have your input on this :)
@rht Yeah, I saw that issue, and was somewhat amused by this comment:
[...] is a huge issue for web browsers, which sometimes have to deal with giant (think tens of megabytes) paragraphs.
I think the performance argument is not a very good one. It'll get slow with very large paragraphs, but there are ways around that (splitting the paragraph, falling back to the greedy line breaking algorithm, etc.) The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).
As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS. All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).
I think the performance argument is not a very good one.
Me either
The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).
Frankly I'd be happy with anything more intelligent than the greedy algorithm used by browsers (somewhat disturbingly it seems IE is the only one supporting something like this currently)
As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS.
Definitely, I only intend to support the basic subset output by LaTeX-to-HTML conversion tools like tex4ht or LaTeXML
All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).
That's good to hear
@bramstein I know you've said that typeset.js
is likely to never be production ready, but how much work would it take to make it robust enough to handle the specific use case I'm interested in here? As in, I can drop the script into a basic HTML document, and it Just Works. For context, I'd like to (eventually) be able to produce HTML versions of the articles in the creative commons arxiv subset ( #1 ) that look (almost) as good as the PDFs. It would be great if this included Knuth-Plass line breaking (but I'm not a web developer, so am somewhat limited in what I'm able to achieve myself)
The bigger issue is that some parts of CSS are incompatible with the TeX model.
If you have an example to pinpoint this incompatibility... (I don't know much of TeX box/glue plumbing)
It's either full TeX typesetting onto a subset of html/css/js, or parts of TeX on full html/css/js (which e.g. for math, is already well supported). @bramstein Why do you suggest the former?
If the goal is to better format the tex4ht/latexml out of scientific papers, then the former is preferable. If the goal is to bring TeX quality typesetting to the web, the latter can be done in piecemeal, https://github.com/w3c/dpub-pagination (why would there be page breaks in a web document?).
Also, mind the format size:
(This one needs justification as well: https://github.com/worrydream/EarlyHistoryOfSmalltalk)
(...what is it like to read originally paged books but without the page breaks helper?)
@rht most of that 4MB is poorly compressed images, which can be improved (eg. using SVG instead of PNG)
Re pagination: I don't think trying to emulate physical books too closely is a good idea, but something definitely needs to be done to improve location memory
Edit: it would be cool if you could leverage something like https://en.m.wikipedia.org/wiki/Method_of_loci for this purpose, eg: gradually changing background colour/pattern/image as you scroll down the page
Another example: https://ipfs.io/ipfs/Qmav57P5mmwcpUtmgRb2tp9j6YpXZdgobxDv7VBeJtgtCp/
@davidar sorry for the late re, I wonder if it is useful to have a more fine-grained href (paragraph, section), like https://github.com/ipfs/go-ipfs/blob/master/core/bootstrap.go#L4. The paper itself was uploaded in 2011 http://arxiv.org/pdf/1104.2778v1.pdf.
For the images, there is also https://www.npmjs.com/package/gulp-imagemin.
For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'. This stuff is more related to #2.
For the experiment with method of loci, it would have been preferred if the author had incorporated this method from the beginning. Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout. If I were to use one, I'd construct such that the mnemonic is naturally connected to the text e.g. a book about the innards of a ship (thought) & the ship it describes (extension).
(papers are often annotated externally, but codes aren't. They are instead referred by range of line number edit: but CR is annotation)
@davidar that looks really good!! maybe soon we'll have damn clickable references :)
@davidar that looks really good!! maybe soon we'll have damn clickable references :)
imported modules in code are not clickable either (unless with sourcegraph).
The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.
@rht yes, section/paragraph linking is definitely something I'd like to do
For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'.
Yeah, I'm not trying to emulate a physical book, but I'd like to remedy some of the deficiencies of on-screen reading in terms of recall, etc.
Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout.
I've experimented with subtly changing the background colour based on scroll position, which seemed to work quite nicely, although it had some technical problems, so i decided to take it out for the moment.
@davidar that looks really good!! maybe soon we'll have damn clickable references :)
@jbenet Yes, that's definitely on my radar, I really hate traditional bibliographies (there's this thing called hyperlinks, people). Of course, you can't hyperlink a dead tree, but who prints stuff these days?
The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.
Cool, although it's not quite as seamless as it could be (e.g. having a citation link directly to the section of the article the author is referencing).
I've broken this into a separate project now: https://davidar.io/TeX.js/
Please submit bugs / feature requests to https://github.com/davidar/TeX.js/issues
Since the html page can't be annotated (/PR-ed), https://github.com/davidar/TeX.js/commit/2268e1166db6d856f1933f3872667f05a9f5f2d4#diff-eacf331f0ffc35d4b482f1d15a887d3bR19 (more citation needed)
I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP. But indeed, there was no hyphenation in retina iOS book reader in 2010, http://www.subtraction.com/2010/06/08/better-screen-same-typography/.
I've experimented with subtly changing the background colour based on scroll position
But again, this is just a mnemonic tool (associating 2 random slightly related facts, much like naming star constellations). Unless the background color is calculated based on the aggregate sentiment of the text in a page/paragraph or something (and there is still risk of fogging the author's intention).
traditional bibliographies (there's this thing called hyperlinks, people)
The recent (in TeX timescale) biblatex package by default displays url if the field exists, but this is the amount of boilerplate code for clickable refs in https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-cap2pfs.tex#L12-L27.
having a citation link directly to the section of the article the author is referencing
The ecosystem doesn't exist yet, but meanwhile, this can be done manually by the author, e.g.
Similarly, to cite the definition of merkledag in the paper, https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf #section2.3sentence2footnote.
hyphenation on the web, 2011, http://blog.fontdeck.com/post/9037028497/hyphens.
Since the html page can't be annotated (/PR-ed)
You're welcome to PR the HTML page (is there a difficulty in doing so?)
I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP.
Yes, I'm having trouble seeing the relevance here though?
But again, this is just a mnemonic tool
Of course. I'm not trying to associate semantically meaningful images to the text, I'm simply trying to improve the ability to recall the position in the text where you read something. The baseline is "I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book", so I'm not aiming for anything more meaningful than that.
biblatex package by default displays url if the field exists
I'm not sure if this is what @jbenet meant, but personally I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations. (Although someone can generate a bibliography from this information if they so desire.)
The ecosystem doesn't exist yet
That's why I'm trying to bootstrap the ecosystem with the arXiv corpus ;)
You're welcome to PR the HTML page (is there a difficulty in doing so?)
I mean, the display of the paper (https://davidar.io/TeX.js/) can't be annotated that I can only comment on the source code.
"I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book"
That is still a more precise address than referring to a background color shade.
I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations
Had thought of that when parsing what 'clickable references' means. But wikipedia still does with displaying the references in a section https://en.wikipedia.org/wiki/Bibliography#References.
Edit: s/background color/background color shade/
@rht You should now be able to directly annotate https://davidar.io/TeX.js/ (and any other page using TEX.js) thanks to @hypothesis (cc @RichardLitt @nickstenning) :smile:
Note to self: think about integrating @ipfs and @hypothesis (cc @jbenet)
@davidar yes, we should do that. there's much overlap.
cc @tilgovi -- we should put public annotations on ipfs. -- also, once we get capabilities, private ones too
@davidar this works very well, good stuff!
Would this be a good repo to open an issue for designing and discussing ipfs comments?
@tilgovi i think so
Opened #12.
@davidar where was your hypothesis annotated version? not finding it
@jbenet the hypothesis enabled version doesn't seem to have made it into ipfs yet, will add it to my to-do list
cc @bigbluehat
It has long been possible to convert TeX to HTML (#1). However, I think it's fair to say that the results are often hideous, as web browsers (by default) suck at typesetting compared to TeX. Fortunately, it is now possible to work around some of these deficiencies with JS and CSS, which I've tied together in
this demohttps://davidar.io/TeX.js/ ( https://github.com/davidar/TeX.js )The aim of this is to achieve (an approximation to) the professional quality of TeX typesetting, whilst integrating with the web and optimising for on-screen viewing better than a PDF viewer can.