Open JohnMcLear opened 10 years ago
What sort of behaviour would you expect? Page breaks strike me as being an artefact of printing of paper, which doesn't really apply when translating to HTML. Open to suggestions though.
To be honest just whacking in a <span class='pageBreak'></span>
would be fine for me.
I'd expect you would want me to use a custom style rule for this, if that's teh case that's fine just lemme know which stylemap key to use :)
I use
page-break-after:always;page-break-inside:avoid;-webkit-region-break-inside: avoid;
to generate the actual page breaks in Etherpad.
I don't use it, but I happen to know that Dreamweaver would wrap content in <div>
tags to mark section-breaks (Page Layout > Page Setup > Breaks). I wouldn't be a fan of this approach as I sometimes use parent-child selectors in my CSS.
I'm not sure that I like the idea of adding classes to the output, @JohnMcLear.
How about adding a simple <hr/>
tag?
The suggestion of using a custom style mapping is the approach that seems best to me. That way, by default we do nothing, but the user can customise the behaviour to whatever HTML they want.
Can you give me example of how to write style map for page breaks to hr tag?
Page breaks aren't supported at the moment. There's some code to handle them, but that likely requires some more work.
For the technical detail: one way that Word encodes page breaks is as an element within a paragraph. As it works right now, that would result in hr
tags with p
elements, which likely isn't the desired behaviour. Lifting the breaks up to the top level is likely to give better results.
I have a use case where customers – wrongly – insert page breaks at the end of pages, and I need to replace them with a space. For that reason it would be good to have a style mapping available that captures (manual) page breaks.
Having the page breaks would be nice for translating into other formats or processing the output html
Hi there, and thank you for this awesome lib :)
I'm using mammoth to turn a structured (with specific styles) .docx file into HTML, do some tweaks on it and then use PagedJS to turn it into a PDF to be printed.
In this case the output is in fact paper again, so page breaks do matter.
Could you please consider supporting page breaks ?
If you have never stumbled upon this, there is a whole open-source movement (the Coko Foundation) advocating for using HTML as the Single Source for publishing books and journal papers using the CSS PagedMedia standard to define the layout of the PDF output. This standard hasn't been implemented yet by any of the major browsers so they built PagedJS that is in essence a glorified polyfill for this standard that is already used in production for many publishing houses, and recently used to produce both a book and a webapp for the Louvres in Paris from the same HTML source.
As above, the problem is that it's not obvious (to me, at least!) what the expected behaviour would be, given a page break can occur in the middle of a paragraph.
If you can provide a minimal example document and the expected HTML (especially with mid-paragraph page breaks), then that would help.
Here I meant only manual page breaks, it didn't even occur to me that one would want to know about automatic page breaks when text naturally overflows a page and continues on the next one :)
In the case of manual page breaks is that already possible ? For me it could be either a separate tag or a way to apply a specific CSS class to the first element after the page break. If there is already a way to do this maybe adding it to the doc wouldn't hurt :)
There's some support for breaks, but it is intentionally undocumented since it's still subject to change.
Could you provide a minimal example document and the expected HTML?
Alright so here's a very simple example .docx file: example.docx
What I'd like to get back would be either this:
<p>This content is on page one.</p>
<hr>
<p>This one on page two.</p>
<p><em>And it has</em></p>
<h1>Some more content to it</h1>
<h2>With a few styles.</h2>
<hr>
<p>This is page three.</p>
or something like that:
<p>This content is on page one.</p>
<p class="break-before">This one on page two.</p>
<p><em>And it has</em></p>
<h1>Some more content to it</h1>
<h2>With a few styles.</h2>
<hr>
<p class="break-before">This is page three.</p>
I think you can already use a style map along the lines of:
br[type='page'] => hr
to get what you want, but be warned that the exact syntax and behaviour might change in the future!
It's working 🎉 If it starts breaking one day I'll know where to look :) Thanks!
Any docs for how to support page breaks?