Closed dleeftink closed 5 months ago
Before: broken blocks
After: original blocks
Can you compare it with the official export effect?
As, display without the css borders? Or the result from the 'Export/Debug' actions?
As, display without the css borders? Or the result from the 'Export/Debug' actions?
Use the official Export to PDF
feature to generate the PDF, then compare the results with the current output. We aim to maintain consistency with the official rendering as much as possible.
Will do. Here is the prettified HTML from the debugger in the meantime (notice the difference at 'some trailing text'):
<div class="print">
<div class="markdown-preview-view markdown-rendered show-properties">
<h3 data-heading="hello">
hello<a href="af://h3-0" class="md-print-anchor"></a>
</h3>
\n
<p>
Paragraph 1 with
<span
alt="bpdf(nest) > Nested block"
src="bpdf(nest)#Nested block"
class="internal-embed markdown-embed inline-embed is-loaded"
></span>
</p>
<div class="markdown-embed-title"></div>
<div class="markdown-preview-view markdown-rendered show-indentation-guide">
<h2 data-heading="Nested block">
Nested block<a href="af://h2-1" class="md-print-anchor"></a>
</h2>
\n
<p>Embedded paragraph.</p>
</div>
some trailing text. <!--broken block-->
<p></p>
\n
<p>Paragraph 2</p>
\n
<h2 data-heading="world">
world<a href="af://h2-2" class="md-print-anchor"></a>
</h2>
\n
<p>Paragraph 3</p>
</div>
</div>
<div class="print">
<div class="markdown-preview-view markdown-rendered show-properties">
<h3 data-heading="hello">
hello<a href="af://h3-0" class="md-print-anchor"></a>
</h3>
\n
<p>
Paragraph 1 with
<span
alt="bpdf(nest) > Nested block"
src="bpdf(nest)#Nested block"
class="internal-embed markdown-embed inline-embed is-loaded"
><div class="markdown-embed-title"></div>
<div class="markdown-preview-view markdown-rendered show-indentation-guide">
<h2 data-heading="Nested block">
Nested block<a href="af://h2-1" class="md-print-anchor"></a>
</h2>
\n
<p>Embedded paragraph.</p>
</div></span
>
some trailing text. <!--original block-->
</p>
\n
<p>Paragraph 2</p>
\n
<h2 data-heading="world">
world<a href="af://h2-2" class="md-print-anchor"></a>
</h2>
\n
<p>Paragraph 3</p>
</div>
</div>
Great, could you please test non-ASCII characters?
Here is the PDF output:
Great, could you please test non-ASCII characters?
I have added two btoa_utf8
and atob_utf8
utility functions to handle unicode characters.
In general however, I think it would be better to move away from this.preview.executeJavascript(...)
to build the PDF DOM. This would require a more significant rewrite along the lines of:
// Setting up containers
const printEl = document.body.createDiv("print");
const viewEl = printEl.createDiv({
cls: "markdown-preview-view markdown-rendered " + cssclasses.join(" ")
});
...
// Rendering markdown
await MarkdownRenderer.render(app, lines.join("\n"), viewEl, file.path, comp);
await MarkdownRenderer.postProcess(...);
...
// Cloning elements
const printDoc = document.implementation.createHTMLDocument("document");
printDoc.body.appendChild(printEl.cloneNode(true));
// Copy Obsidian styles
printDoc.head = document.head.innerHTML;
// Preparing data transfer
const blob = new Blob([printDoc], {
type: "text/html",
});
// Create object URL
const printUrl = window.URL.createObjectURL(blob);
...
// Transfer HTML to print preview
this.preview.loadUrl(printUrl);
this.preview.webContents.once("did-finish-load", async () => {
// Clean up after --> not sure URL's can be revoked from within another webview
URL.revokeObjectURL(printUrl);
... additional logic ...
})
But even then, the issue of .innerHTML
breaking the block context of nested blocks may persist. The reason why it works in native Obisidian is that embeds are inserted after the initial DOM has been laid out, which circumvents block context breakage. Whether this method used by the Obsidian devs follows the proper HTML5 spec is a discussion I'll leave for another day...
In fact, the way Obsidian handles internal embeds via nested divs is inherently incompatible with how pagedjs lays out pages, as its layouting algorithm requires HTML to adhere to the spec. Even when implementing a mechanism where block level elements are replaced with inline elements, this would result in user css styles likely becoming incompatible with the post-processed DOM.
I've been investigating integrating pagedjs and Obsidian for a while now, and am happy to draw up a roadmap or set of requirements if you're interested.
In fact, the way Obsidian handles internal embeds via nested divs is inherently incompatible with how pagedjs lays out pages, as its layouting algorithm requires HTML to adhere to the spec. Even when implementing a mechanism where block level elements are replaced with inline elements, this would result in user css styles likely becoming incompatible with the post-processed DOM.
I've been investigating integrating pagedjs and Obsidian for a while now, and am happy to draw up a roadmap or set of requirements if you're interested.
I initially attempted to integrate Paged.js and Obsidian, but failed (blank pages couldn't display). Do you have any good ideas?
I initially attempted to integrate Paged.js and Obsidian, but failed (blank pages couldn't display). Do you have any good ideas?
The latest beta has a lot of pagination fixes, but still has some issues when elements have display: none
, overflow: hidden
or nested <br>
elements inside complex trees (e.g. tables). Generally, pagedjs works best with a 'flat' DOM structure where each element has an intrinsic width and height (hence why Obsidian embeds/code block replacements may cause issues).
However, the devs are quick to respond to your particular issues as I haven't encountered the blank page issue previously. My guess is that the chunker and paginator have not properly finished. It is also good practice to isolate the pagedJSPolyFill
inside a webview
or <iframe>
and have it run unaffected by styles and scripts present in the parent context.
In my view, the biggest hurdle to make pagedJS work with Obsidian is agreeing on how nested markdown is transformed into a single document: how to signal which notes are chapters, sections or paragraphs? How is the table of contents generated? How to best collect footnotes/sidesnotes and multi-column markdown? Resolve inconsistent heading levels? Merge/split list and tables? Dataview elements? The list goes on.
My current approach is generating a DAG and an AST of each note in the DAG and have a list of custom markdown transformers that are able to handle various cases. Making this work for all edge cases is no small feat.
This took a while to track down, after noticing my
@media print { ... }
css files where rendering differently using your plugin compared to Obsidian's native print functionality.Issue: broken block contexts
A better (but still not ideal) solution to have
span.markdown-embeds
behave similarly to native Obsidian, requires nested block-level elements to be inserted after the document.body.innerHTML has been set in line 206. Namely, certain block level elements (divs, blockquotes, etc.) force a break in the block context of a parent element (most notably<p>
) which messes up the resulting HTML if set usingdocument.body.innerHTML
:Leads to the following output with orphaned nodes when parsed by
.innerHTML
:A possible solution is base64 encoding the content of
span.markdown-embed
elements starting from the last element by changinghttps://github.com/l1xnan/obsidian-better-export-pdf/blob/16229e564fd85a0c3a7764f8ed905e708aba8784/src/render.ts#L234-L246
to this encoding function and then recursively decoding the base64 encoded elements after
document.body.innerHTML
has been set:https://github.com/l1xnan/obsidian-better-export-pdf/blob/16229e564fd85a0c3a7764f8ed905e708aba8784/src/modal.ts#L206
using this decoding function.
Workaround
This workaround maintains nested block level elements without forcing breaks in the block context of the parent element and messing up the layout. The following functions are added
encodeEmbeds(doc)
[...document.querySelectorAll('span.markdown-embeds')].reverse()
;span.markdown-embed
elements usingspan.innerHTML = btoa(span.innerHTML)
decodeAndReplaceEmbed(element)
appendWebview
functionspan.markdown-embeds
recursively<??>
Note: the
encodeEmbeds(doc)
function may need to be rewritten to encode the deepest nested (sibling) elements first instead of starting with the last element. However, simply reversing the order of iteration seems fine in my testing, as deeply nested elements always come later when selected usingdocument.querySelectorAll()
.