jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.27k stars 3.36k forks source link

Provide an option to embed only local resources #8362

Open DhruvaSambrani opened 2 years ago

DhruvaSambrani commented 2 years ago

Explain the problem. when a markdown file is compiled to HTML using --mathjax and --embed-resources, pandoc tries to also embed mathjax fonts, which are obviously not available, leading to a bad output for math. Dropping embed-resources leads to fonts downloaded at viewtime. MWE-

$$t=\alpha^2 \begin{bmatrix}
3 & 4 \\
4 & 5
\end{bmatrix}$$

$ pandoc test.md --standalone --embed-resources --mathjax -o test.html

Pandoc version?

pandoc 2.19.2
Compiled with pandoc-types 1.22.2.1, texmath 0.12.5.3, skylighting 0.13,
citeproc 0.8.0.1, ipynb 0.2, hslua 2.2.1
Scripting engine: Lua 5.4
jgm commented 2 years ago

So, don't use --embed-resources with --mathjax? Or is there some particular change to pandoc's behavior that you have in mind? If you're going to have access to the net to download the fonts, then there's little point in using --embed-resources for other things.

jgm commented 2 years ago

See also commit 63deba49d4f93a4ed1520b9a4b11786e1b8c2eb9 and #682.

jgm commented 2 years ago

There is actually a way to have resources embedded while leaving mathjax as just a link to the CDN. You'll need to use a default template. Generate the standard html5 template using pandoc -D html5 > newtemplate.html. Replace the part that says

$if(math)$
  $math$
$endif$

with

 <script data-external="1"
  src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js"
  type="text/javascript"></script>

Note the data-external attribute; s.v. "Linked media" in the manual. This will cause the resource-embedding to skip this tag.

Now use pandoc with

pandoc --template newtemplate.html --mathjax --embed-resources -s
DhruvaSambrani commented 2 years ago

Yeah I was thinking the same, but erred on the side of opening an issue.

That said, does it make sense to add an option to allow only local links to be embedded? My current usecase was to make an HTML report with some math and a few images, but sharing the HTML as a standalone without the images, since I don't want to set up a server just for this.

jgm commented 2 years ago

I can see the point of that. I'll change the title of your issue.

DhruvaSambrani commented 1 year ago

A simpler solution may be to change the replacement of the math variable with the data-external="1" version by default, and let people use --mathjax=url to force embedding of mathjax. This may not be extendable though.

jgm commented 1 year ago

If we change the default template like so:

$if(math)$
$if(mathjax)$
  <script data-external="1" src="$mathjaxurl$" type="text/javascript"></script>
$else$
  $math$
$endif$
$endif$

then it would add data-external="1" automatically, while still allowing users to change this behavior if they want, by modifying the default template.

But I'm not sure this is the best solution. Some people might prefer to have the core mathjax stuff baked in, even if the fonts are not quite right.

jgm commented 1 year ago

Another option is to add an optional attribute to --embed-resources. --embed-resources=[local|remote|all] with default all.

ntnsndr commented 1 year ago

I've had this same issue with embedded YouTube videos (which I frequently use in class)—the iframe is black if --embed-resources is used.

If I use the data-external="1" workaround, the YouTube video shows, but then when I save the presentation as .html in Firefox, it no longer produces a single file, but an .html file plus a folder with additional things.

The expected behavior for me would be for --embed-resources to not embed remote resources by default, and still produce a single .html file.

jgm commented 1 year ago

Pandoc has nothing to do with the behavior of "Save" in Firefox.

ntnsndr commented 1 year ago

I'm sorry, I recognize that. Just trying to share more data-points to the discussion, in case Firefox's behavior is revealing in some way.

jgm commented 1 year ago

I think that if the resource is not embedded, then Firefox is not going to produce a single .html file. So there's really no way to get what you're asking for.

ChenZhongPu commented 1 year ago

Another option is to add an optional attribute to --embed-resources. --embed-resources=[local|remote|all] with default all.

When will this new feature be added?

jgm commented 1 year ago

Still not sure whether a new command-line option is really needed. The mathjax issue could be addressed by the solution in comment https://github.com/jgm/pandoc/issues/8362#issuecomment-1289466907 which still seems good to me.

allefeld commented 1 year ago

I support the addition of this new option --embed-resources=[local|remote|all]. For a defaults file, it would be embed-resources: [none|local|remote|all], for backwards compatibility with false as an alias for none and true as an alias for all.

My use case is similar to something already mentioned: I prepare materials for a class as an HTML file, uploaded on Moodle. This document is meant to be viewed online, therefore loading resources from the net is not a problem. However, additionally uploading local external resources is cumbersome.

Editing the template is a reasonable workaround, but is harder when using Pandoc through Quarto – their template is not a simple file which can be copied and modified, it is dynamically generated (imho unfortunately). I came up with the following hack:

embed-resources: true
html-math-method:
  method: mathjax
  url: ""
header-includes: <script data-external="1" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" type="text/javascript"></script>

Edit: url: "" seems to (sometimes?) break the HTML output. Better to use data:,.

cderv commented 1 year ago

Editing the template is a reasonable workaround, but is harder when using Pandoc through Quarto – their template is not a simple file which can be copied and modified, it is dynamically generated (imho unfortunately).

In quarto, we are patching the template to make math avoid self contained already and by default. This is explained in https://quarto.org/docs/output-formats/html-publishing.html#standalone-html

There is a specific option self-contained-math: true to not change the default pandoc behavior.

So there should be no need to patch the template with quarto.

If this is not working (anymore) this is a regression and a bug that we should fix. Open an issue in quarto if so. Thanks !

allefeld commented 1 year ago

@cderv I tried again, and now it works. I can't reconstruct what the problem was that lead me to this Pandoc issue and my elaborate hack.

I still think it would be better to solve this on the Pandoc side with extended options to embed-resources, instead of patching the template by Quarto. As expressed here, I wish Quarto would do less intransparent things, to make it easier for Pandoc users to adapt it to one's needs.

allefeld commented 6 months ago

I ran into this problem again while using Noto Sans & Mono in several weights in a Quarto document, from Google Fonts referenced using an @import rule in CSS.

Without embedding, the HTML file is created in less than a second and has 29,972 bytes,\ with embedding it takes half a minute and the file has 38,923,148 bytes.

I'll try to circumvent this by referencing the webfonts using <link data-external="1" href="…" rel="stylesheet"> instead.

But I still think having an extended syntax --embed-resources=[local|remote|all] would be very useful. The value of embed-resources for me is mainly that I can send or upload a document as a single file. Embedding webfonts and other network resources is not necessary and bloats the output.