TeX for the web - Githubissues

davidar commented 9 years ago

It has long been possible to convert TeX to HTML (#1). However, I think it's fair to say that the results are often hideous, as web browsers (by default) suck at typesetting compared to TeX. Fortunately, it is now possible to work around some of these deficiencies with JS and CSS, which I've tied together in ~~this demo~~ https://davidar.io/TeX.js/ ( https://github.com/davidar/TeX.js )

The aim of this is to achieve (an approximation to) the professional quality of TeX typesetting, whilst integrating with the web and optimising for on-screen viewing better than a PDF viewer can.

rht commented 9 years ago

Past attempt in firefox, https://bugzilla.mozilla.org/show_bug.cgi?id=630181

rht commented 9 years ago

@bramstein

davidar commented 9 years ago

Yes, @bramstein it would be fantastic to have your input on this :)

@rht Yeah, I saw that issue, and was somewhat amused by this comment:

[...] is a huge issue for web browsers, which sometimes have to deal with giant (think tens of megabytes) paragraphs.

bramstein commented 9 years ago

I think the performance argument is not a very good one. It'll get slow with very large paragraphs, but there are ways around that (splitting the paragraph, falling back to the greedy line breaking algorithm, etc.) The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).

As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS. All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).

davidar commented 9 years ago

I think the performance argument is not a very good one.

Me either

The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).

Frankly I'd be happy with anything more intelligent than the greedy algorithm used by browsers (somewhat disturbingly it seems IE is the only one supporting something like this currently)

As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS.

Definitely, I only intend to support the basic subset output by LaTeX-to-HTML conversion tools like tex4ht or LaTeXML

All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).

That's good to hear

@bramstein I know you've said that typeset.js is likely to never be production ready, but how much work would it take to make it robust enough to handle the specific use case I'm interested in here? As in, I can drop the script into a basic HTML document, and it Just Works. For context, I'd like to (eventually) be able to produce HTML versions of the articles in the creative commons arxiv subset ( #1 ) that look (almost) as good as the PDFs. It would be great if this included Knuth-Plass line breaking (but I'm not a web developer, so am somewhat limited in what I'm able to achieve myself)

davidar commented 9 years ago

Alright, here's my first approximation (just using greedy justification for now):

Before
After
PDF for comparison

rht commented 9 years ago

The bigger issue is that some parts of CSS are incompatible with the TeX model.

If you have an example to pinpoint this incompatibility... (I don't know much of TeX box/glue plumbing)

It's either full TeX typesetting onto a subset of html/css/js, or parts of TeX on full html/css/js (which e.g. for math, is already well supported). @bramstein Why do you suggest the former?

If the goal is to better format the tex4ht/latexml out of scientific papers, then the former is preferable. If the goal is to bring TeX quality typesetting to the web, the latter can be done in piecemeal, https://github.com/w3c/dpub-pagination (why would there be page breaks in a web document?).

Also, mind the format size:

pdf: 352KB
mhtml: 4.1MB
justified mhtml: 4.3MB
justified mhtml.tar.bz2: 3.1MB

(This one needs justification as well: https://github.com/worrydream/EarlyHistoryOfSmalltalk)

rht commented 9 years ago

(...what is it like to read originally paged books but without the page breaks helper?)

davidar commented 9 years ago

@rht most of that 4MB is poorly compressed images, which can be improved (eg. using SVG instead of PNG)

Re pagination: I don't think trying to emulate physical books too closely is a good idea, but something definitely needs to be done to improve location memory

Edit: it would be cool if you could leverage something like https://en.m.wikipedia.org/wiki/Method_of_loci for this purpose, eg: gradually changing background colour/pattern/image as you scroll down the page

davidar commented 9 years ago

Another example: https://ipfs.io/ipfs/Qmav57P5mmwcpUtmgRb2tp9j6YpXZdgobxDv7VBeJtgtCp/

rht commented 9 years ago

@davidar sorry for the late re, I wonder if it is useful to have a more fine-grained href (paragraph, section), like https://github.com/ipfs/go-ipfs/blob/master/core/bootstrap.go#L4. The paper itself was uploaded in 2011 http://arxiv.org/pdf/1104.2778v1.pdf.

For the images, there is also https://www.npmjs.com/package/gulp-imagemin.

For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'. This stuff is more related to #2.

For the experiment with method of loci, it would have been preferred if the author had incorporated this method from the beginning. Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout. If I were to use one, I'd construct such that the mnemonic is naturally connected to the text e.g. a book about the innards of a ship (thought) & the ship it describes (extension).

rht commented 9 years ago

(papers are often annotated externally, but codes aren't. They are instead referred by range of line number edit: but CR is annotation)

jbenet commented 9 years ago

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

rht commented 9 years ago

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

imported modules in code are not clickable either (unless with sourcegraph).

rht commented 9 years ago

The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.

davidar commented 9 years ago

@rht yes, section/paragraph linking is definitely something I'd like to do

For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'.

Yeah, I'm not trying to emulate a physical book, but I'd like to remedy some of the deficiencies of on-screen reading in terms of recall, etc.

Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout.

I've experimented with subtly changing the background colour based on scroll position, which seemed to work quite nicely, although it had some technical problems, so i decided to take it out for the moment.

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

@jbenet Yes, that's definitely on my radar, I really hate traditional bibliographies (there's this thing called hyperlinks, people). Of course, you can't hyperlink a dead tree, but who prints stuff these days?

The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.

Cool, although it's not quite as seamless as it could be (e.g. having a citation link directly to the section of the article the author is referencing).

davidar commented 9 years ago

I've broken this into a separate project now: https://davidar.io/TeX.js/

Please submit bugs / feature requests to https://github.com/davidar/TeX.js/issues

rht commented 9 years ago

Since the html page can't be annotated (/PR-ed), https://github.com/davidar/TeX.js/commit/2268e1166db6d856f1933f3872667f05a9f5f2d4#diff-eacf331f0ffc35d4b482f1d15a887d3bR19 (more citation needed)

I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP. But indeed, there was no hyphenation in retina iOS book reader in 2010, http://www.subtraction.com/2010/06/08/better-screen-same-typography/.

I've experimented with subtly changing the background colour based on scroll position

But again, this is just a mnemonic tool (associating 2 random slightly related facts, much like naming star constellations). Unless the background color is calculated based on the aggregate sentiment of the text in a page/paragraph or something (and there is still risk of fogging the author's intention).

traditional bibliographies (there's this thing called hyperlinks, people)

The recent (in TeX timescale) biblatex package by default displays url if the field exists, but this is the amount of boilerplate code for clickable refs in https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-cap2pfs.tex#L12-L27.

having a citation link directly to the section of the article the author is referencing

The ecosystem doesn't exist yet, but meanwhile, this can be done manually by the author, e.g.

"Git has already influenced distributed filesystem design". The fact is stated in http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf #section3.1sentence1.
"Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily." The fact referred is in https://www.cl.cam.ac.uk/~lw525/publications/P2P2013_13.pdf #sectionIV.Fsentence2.

Similarly, to cite the definition of merkledag in the paper, https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf #section2.3sentence2footnote.

rht commented 9 years ago

hyphenation on the web, 2011, http://blog.fontdeck.com/post/9037028497/hyphens.

davidar commented 9 years ago

Since the html page can't be annotated (/PR-ed)

You're welcome to PR the HTML page (is there a difficulty in doing so?)

I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP.

Yes, I'm having trouble seeing the relevance here though?

But again, this is just a mnemonic tool

Of course. I'm not trying to associate semantically meaningful images to the text, I'm simply trying to improve the ability to recall the position in the text where you read something. The baseline is "I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book", so I'm not aiming for anything more meaningful than that.

biblatex package by default displays url if the field exists

I'm not sure if this is what @jbenet meant, but personally I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations. (Although someone can generate a bibliography from this information if they so desire.)

The ecosystem doesn't exist yet

That's why I'm trying to bootstrap the ecosystem with the arXiv corpus ;)

rht commented 9 years ago

You're welcome to PR the HTML page (is there a difficulty in doing so?)

I mean, the display of the paper (https://davidar.io/TeX.js/) can't be annotated that I can only comment on the source code.

"I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book"

That is still a more precise address than referring to a background color shade.

I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations

Had thought of that when parsing what 'clickable references' means. But wikipedia still does with displaying the references in a section https://en.wikipedia.org/wiki/Bibliography#References.

Edit: s/background color/background color shade/

davidar commented 9 years ago

I plan on integrating https://hypothes.is soon, so stay tuned ;)
Yes, we need to balance precision against recall. The essential feature of recalling location in physical books is a combination of a low frequency (approximate position in book) and high frequency (left right top bottom of page) component. So, perhaps two colours could work better? Note that I'm not taking about communicating locations, but about subconscious recall.
But Wikipedia also has popups when you hover over citations in the text. You can certainly have both, yes.

davidar commented 9 years ago

@rht You should now be able to directly annotate https://davidar.io/TeX.js/ (and any other page using T_EX.js) thanks to @hypothesis (cc @RichardLitt @nickstenning) :smile:

Note to self: think about integrating @ipfs and @hypothesis (cc @jbenet)

jbenet commented 9 years ago

@davidar yes, we should do that. there's much overlap.

cc @tilgovi -- we should put public annotations on ipfs. -- also, once we get capabilities, private ones too

jbenet commented 9 years ago

@davidar this works very well, good stuff!

tilgovi commented 9 years ago

Would this be a good repo to open an issue for designing and discussing ipfs comments?

whyrusleeping commented 9 years ago

@tilgovi i think so

tilgovi commented 9 years ago

Opened #12.

jbenet commented 8 years ago

@davidar where was your hypothesis annotated version? not finding it

davidar commented 8 years ago

@jbenet the hypothesis enabled version doesn't seem to have made it into ipfs yet, will add it to my to-do list

jbenet commented 8 years ago

cc @bigbluehat

ipfs / apps

TeX for the web #5