Open alibou99 opened 5 years ago
Simpler way to reproduce this:
% pandoc -f html+epub_html_exts -t native
<p class="txt_courant_justif"><span epub:type="pagebreak" id="page_36" title="36"/>
Text1
<a class="apnb" epub:type="noteref" id="ap_ntb-002-1" href="p1chap2.xhtml#ntb-002">1</a>.
Text2
<a class="apnb" epub:type="noteref" id="ap_ntb-003-1" href="p1chap2.xhtml#ntb-003">2</a>
Text3
</p>
<p class="txt_courant_justif">
Text4
</p>
^D
[Para [Span ("page_36",[],[("type","pagebreak"),("title","36")]) [],SoftBreak,Str "Text1",SoftBreak,Str ""]
,Para [Str "Text4"]]
I'm new to Git, excuse me if I do not understand everything, but what does it mean ?
my source document is an epub 3.0 not an html
This gives a way to reproduce the underlying issue in a simpler way, without actually producing an epub (because the epub reader uses the html reader plus a special extension under the hood). It's really a "note to self" for me to diagnose this.
thank you very much, I just tested the conversion via Calibre, no problem, I have the whole text. However with caliber, the notes are not recognized as such
Yes, pandoc is stumbling on notes that refer to another file, such as href="p1chap2.xhtml#ntb-002"
.
With the commit I just pushed, we now get:
[Para [Span ("page_36",[],[("type","pagebreak"),("title","36")]) [],SoftBreak,Str "Text1",SoftBreak,Link ("ap_ntb-002-1",["apnb"],[]) [Str "1"] ("p1chap2.xhtml#ntb-002",""),Str ".",SoftBreak,Str "Text2",SoftBreak,Link ("ap_ntb-003-1",["apnb"],[]) [Str "2"] ("p1chap2.xhtml#ntb-003",""),SoftBreak,Str "Text3"]
,Para [Str "Text4"]]
which is an improvement. The missing text is no longer missing. However, the noterefs are being parsed as links rather than proper noterefs, so there is still work to do.
I work for a non-profit organization, we prepare books for digital braille so that it is used by the blind. for this, our pivot format is docx or RTF. for the moment Pandoc manages at least the thing, but with this problem of the missing texts, I am reviewing all the procedure to switch to another tool, I hope that we will find a quick solution.
By tonight there should be a nightly available in pandoc-nightlies; this will at least solve the missing text problem.
very good news, how can I benefit from this corrected version as quickly as possible ?
I installed pandoc via chocolatey
this is my first post in the git, is there a specific command to update Pandoc on my computer and take advantage of the fix ? thank you
Here's a binary of the latest Windows build: https://ci.appveyor.com/project/jgm/pandoc/build/job/gy92q5at64l3e68q/artifacts
thank you very much it works very well and I'm no longer missing text. now trying to see the problem of footnotes. here are two examples, the first code works very well, the conversion eoub to docx produces a word document that recognizes the footnotes, the second example do not have it. example1: good one
<p class="nonindentb">Text1<a epub:type="noteref" class="noteref" id="fn-1" href="#fn1">1</a> Text2</p>
<div epub:type="footnote" id="fn1">
<p class="noindent0"><a class="link" href="#fn-1"><span style="color: #000000;">1</span></a>. Text...</p>
</div>
example2 : bad one
<p class="txt_courant_justif"><span epub:type="pagebreak" id="page_36" title="36"/>
Text1
<a class="apnb" epub:type="noteref" id="ap_ntb-003-1" href="p1chap2.xhtml#ntb-003">2</a>
Text2
</p>
<p class="txt_courant_justif">
Text4
</p>
<section class="defnotes" epub:type="footnotes">
<!--note--><aside class="ntb" epub:type="footnote" id="ntb-003">
<p class="txt_justif"><a href="p1chap2.xhtml#ap_ntb-003-1">2</a>. Text...</p></aside>
<!--note--></section></section>
I greatly appreciate your help
Yes, the problem is that pandoc currently will only pick up footnotes that are defined in the same file. In your second example the note is in a different file.
PS C:\files\dev\Pandoc> pandoc --version pandoc.exe 2.7.2 Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.7.7
issue : when I convert EPUB files to md, docx, or html, some of the text is missing. it happens when there is a note call. here is an example :
in this example, the Tex2 and Text3 are missing in the output