Open stalniy opened 5 years ago
+1
For now, there is no page info return by convertToHtml
func.
I too have a need to treat each page of a Word document as an HTML page.
After reading your code, would this be solved by a style rule of
"br[type='page'] => div.page:fresh"
and then split the output with
<div class="page"></div>
or whatever element you choose.
It would need an option like ignorePageBreak to change the value in docx/body-reader.js/ignoreElements. Of course,, it may be more complicated that.
Defo this is needed for my team :+1:
This would be extremely useful for our team, where having the page number metadata will be very helpful for GPT to parse our documents properly
It would be good to have a possibility to convert big docx file by chunks (by few pages).