manuels / texlive.js

Compiling LaTeX (TeX live) in your browser
http://manuels.github.com/texlive.js/
GNU General Public License v2.0
1.25k stars 140 forks source link

Server configuration sensitivity #26

Closed xylo04 closed 9 years ago

xylo04 commented 9 years ago

I'm having luck running my app and compiling TeX on my localhost using the node.js http-server, but in other server configurations (python SimpleHTTPServer, Google App Engine), I tend to get errors like:

Runaway definition?
->\let \@oddfoot \@empty \def \@oddhead { 
! File ended while scanning definition of \ps@headings.
<inserted text> 
                }
l.2 

)
Runaway definition?
->\let \@oddfoot \@empty \def \@oddhead { }\let \@unprocessedoptions \ETC.
! File ended while scanning definition of \ps@headings.
<inserted text> 
                }
<*> &latex input.tex

! Emergency stop.
<*> &latex input.tex

!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on input.log.

Or sometimes:

/bin/this.program: fatal: Could not undump 1 4-byte item(s) from latex.fmt.

This second one is seemingly fixed if I clear my browser's cache. Based on that, I feel like maybe it's a header issue, maybe encoding, but I'm not sure.

Are you aware of any server configuration tricks that need to be addressed for texlive.js to work properly?

manuels commented 9 years ago

Nope, not really. Is this consistent in all browsers?

xylo04 commented 9 years ago

I was working primarily in Chrome 42; I just tried Firefox 37, with similar (but not identical) errors to the first error. Firefox actually gave me this warning about my index.html, though:

The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol.

So I'll try setting UTF-8 encoding either though headers or pre-processor tags and see if that will clear up the first error.

The second Could not undump error seems to happen often on the second time compiling. (I'm instantiating new PDFTeX object each time as recommended elsewhere, so that shouldn't be the issue.)

xylo04 commented 9 years ago

I have a reliable reproduction case for at least the Could not undump error, but it's in a prototype I'm not ready to publish yet. Can you email me, and I'll provide details? I just made my email address available on my profile.

manuels commented 9 years ago

You guys at Google...

I'd prefer an open discussion. Are these details really so confidential?

xylo04 commented 9 years ago

Not particularly, but I have to have approval before releasing open source (technically, even patches/pull requests), yada yada... I'll be very happy once I am able to open source my project so I don't have to keep tiptoeing around red tape!

You can find the prototype here; it will open a tex file in Google Drive that I've shared publicly. You'll have to authorize my app to use a Google account of yours, but only files you explicitly authorize will be readable. After that, the file's content should show up in the left pane; clicking the red button should begin compiling, and the generated PDF should be previewed in the right pane.

Clicking the red button a second time causes the could not undump error, as does reloading the page and clicking the red button for the first time in the new session. I'm able to reproduce this in Chrome and Firefox. Clearing the browser's cache makes compiling work again. This behavior is different from what I experience using the Node http-server locally: in that environment, I'm able to compile time after time to my heart's content.

Note this is still a prototype and has some rough edges, not to mention I'm a backend engineer by trade and it's been a while since I've done much JavaScript!

EDIT: I've moved the prototype here; if you see an SSL warning page, you simply need to type 'danger' in the window to bypass.

manuels commented 9 years ago

Did you compile the latex compiler to js yourself or did you use my version?

xylo04 commented 9 years ago

I haven't tried compiling it myself yet. I used the github copies.

manuels commented 9 years ago

This error is really weird. I'd say it's probably that your server is serving corrupted files (or files that are interpreted incorrectly). I'd check mime types and maybe try to use PDFTex.FS_readFile() to check if the file contents of e.g. latex.fmt isn't corrupted.

xylo04 commented 9 years ago

No luck yet. I've noted that App Engine is not sending a Content-Length header, which according to some Googling is OK because it's supposed to use Transfer-Encoding: chunked instead - only I don't see it doing that either. So perhaps it's an App Engine issue. I feel like I saw the undump error on a different server before, but I don't remember which one, maybe Apache.

xylo04 commented 9 years ago

Another difference that could possibly explain it is gzip. I would think whatever's doing the XHR to load e.g. latex.fmt or article.cls would be smart enough to ungzip, or that the browser would do it before it's handed back, but if that wasn't the case it would explain why the files look corrupted.

manuels commented 9 years ago

yeah, it should ungzip it automatically. But have you tried to enable gzip on your nodejs server just to be sure?

xylo04 commented 9 years ago

Well, I finally started using FS_readFile properly and found that latex.fmt (which is not transferred with gzip) is being loaded into the Emscripten file system correctly, but minimal.cls is not correct; it's supposed to be 2028 bytes long, but when served by AppEngine it's being truncated to 1032 bytes in the filesystem, which is suspiciously close to the 1039 bytes that were transferred over the wire while it was gzipped. The content of the file is plaintext (not gzipped) but truncated. I'm still trying to see what I can do to enable gzip locally, or disable gzip on AppEngine, to verify that's the root cause.

manuels commented 9 years ago

Wups, sounds like a rather serious bug in the AppEngine. Maybe this helps debugging: https://cloud.google.com/appengine/kb/general

We use a combination of request headers (Accept-Encoding, User-Agent) and response headers (Content-Type) to determine whether or not the end-user can take advantage of gzipped content. This approach avoids some well-known bugs with gzipped content in popular browsers. To force gzipped content to be served, clients may supply 'gzip' as the value of both the Accept-Encoding and User-Agent request headers. Content will never be gzipped if no Accept-Encoding header is present.

xylo04 commented 9 years ago

Alright, I can report some progress!

I was able to find a way to get AppEngine to serve all of the TexLive resources, .cls files in particular, with application/octet-stream which appears to disable gzip. Now that I've done that, I'm able to compile reliably the first time I load the page.

The second time I compile, I get back to the could not undump error, and with the FS_readFile logging in place, I can see that the file doesn't exist in the Emscripten file system. Looking back at the Network tab, AppEngine is deciding to return a 304 Not Modified instead of re-serving the file. My next step is to see if I can disable that behavior, and always serve the file content.

I suspect Emscripten is doing something non-standard to fetch files from the network and populate them in the virtual file system. If that were the case, it would explain why gzipped responses are truncated, and why cached responses end up as empty files in the file system.

EDIT: Just saw your response, and yep, could be an AppEngine bug as well. At very least, they seem to have a non-standard default configuration and I'm having one heck of a time configuring around it.

xylo04 commented 9 years ago

I posed the question to emscripten-discuss.

xylo04 commented 9 years ago

Huh, I'm no longer seeing issues with 304 Not Modified responses. It looks like App Engine is no longer serving them, always serving 200's with content? I'm not sure what changed, but it's working at the moment.

xylo04 commented 9 years ago

I'm still not sure how I configured around this, but until further notice I think it's safe to assume that the following holds true when serving texlive.js and probably all Emscripten libraries:

manuels commented 9 years ago

really strange, did you try to setup gzip and/or caching on your nodejs setup?

xylo04 commented 9 years ago

Unfortunately I haven't taken the time to confirm it properly that way. I should, and I'll see if I can make some time to do so, but that's my current feeling.