Closed tillydray closed 6 months ago
I think the issue is really in the EPUB reader and it can be shown with this simple example:
% pandoc -o my.epub
# One
link to [twosub](#twosub)
# Two
ok
## twosub
ok
[WARNING] This document format requires a nonempty <title> element.
Defaulting to '-' as the title.
To specify a title, use 'title' in metadata or --metadata title="...".
% pandoc my.epub -t html
<p><span id="ch001.xhtml"></span></p>
<section id="ch001.xhtml#one" class="level1" data-number="1">
<h1 data-number="1">One</h1>
<p>link to <a href="#ch002.xhtml#twosub">twosub</a></p>
</section>
<p><span id="ch002.xhtml"></span></p>
<section id="ch002.xhtml#two" class="level1" data-number="2">
<h1 data-number="2">Two</h1>
<p>ok</p>
<section id="ch002.xhtml#twosub" class="level2" data-number="2.1">
<h2 data-number="2.1">twosub</h2>
<p>ok</p>
</section>
</section>
Here we get things like a reference to #ch002.xhtml#twosub
.
The fragment shouldn't contain the character #
. I don't know if that's the only issue for org, but it may be one issue.
You could try changing
[[#hcp-nrsvuebib-0005.xhtml#otbooks][Old Testament Table of Contents]]
in your org output to
[[#hcp-nrsvuebib-0005.xhtml_otbooks][Old Testament Table of Contents]]
and changing
<<hcp-nrsvuebib-0005.xhtml#otbooks>>
to
<<hcp-nrsvuebib-0005.xhtml_otbooks>>
and see if that fixes the link. That would be good for me to know.
I made the changes to the example output but when clicking on the link I get this error: No match for custom ID: hcp-nrsvuebib-0005.xhtml_otbooks
. In case it's relevant, I also re-generated the org from the original epub, made the changes you suggested, and still got the same error :( For a sanity check, I've pasted the changed example output just in case I've done it wrong somehow
[[#hcp-nrsvuebib-0005.xhtml_otbooks][Old Testament Table of Contents]]
--------------
[[#hcp-nrsvuebib-0010.xhtml_otpt][OLD TESTAMENT]]
--------------
[[#hcp-nrsvuebib-0011.xhtml_bk01][Genesis]]
[[#hcp-nrsvuebib-0011.xhtml_ch01001][1]] |
[[#hcp-nrsvuebib-0010.xhtml_ot][The Old Testament]]
| [[#hcp-nrsvuebib-0011.xhtml_bk01][Genesis]] | [[#hcp-nrsvuebib-0011.xhtml_bk01][Gen]] |
<<hcp-nrsvuebib-0010.xhtml>>
<<hcp-nrsvuebib-0010.xhtml_otpt>>
<<hcp-nrsvuebib-0010.xhtml_ot>>
[[#hcp-nrsvuebib-0005.xhtml_otbooks][The Old Testament]]
<<hcp-nrsvuebib-0011.xhtml>>
<<hcp-nrsvuebib-0011.xhtml_bk01>>
<<hcp-nrsvuebib-0011.xhtml_ch01001>>
[[#hcp-nrsvuebib-0005.xhtml_rbk01][Genesis]]
[[#hcp-nrsvuebib-0005.xhtml_rbk01][Genesis 1]]
Six Days of Creation and the Sabbath
1When God began to create[[#hcp-nrsvuebib-0013.xhtml_fn01001001-1][a]] the heavens and the earth, 2the earth was complete
<<hcp-nrsvuebib-0011.xhtml_ch01002>>
I'm copying the code from above since replies from email don't render as markdown:
[[#hcp-nrsvuebib-0005.xhtml_otbooks][Old Testament Table of Contents]]
--------------
[[#hcp-nrsvuebib-0010.xhtml_otpt][OLD TESTAMENT]]
--------------
[[#hcp-nrsvuebib-0011.xhtml_bk01][Genesis]]
[[#hcp-nrsvuebib-0011.xhtml_ch01001][1]] |
[[#hcp-nrsvuebib-0010.xhtml_ot][The Old Testament]]
| [[#hcp-nrsvuebib-0011.xhtml_bk01][Genesis]] | [[#hcp-nrsvuebib-0011.xhtml_bk01][Gen]] |
<<hcp-nrsvuebib-0010.xhtml>>
<<hcp-nrsvuebib-0010.xhtml_otpt>>
<<hcp-nrsvuebib-0010.xhtml_ot>>
[[#hcp-nrsvuebib-0005.xhtml_otbooks][The Old Testament]]
<<hcp-nrsvuebib-0011.xhtml>>
<<hcp-nrsvuebib-0011.xhtml_bk01>>
<<hcp-nrsvuebib-0011.xhtml_ch01001>>
[[#hcp-nrsvuebib-0005.xhtml_rbk01][Genesis]]
[[#hcp-nrsvuebib-0005.xhtml_rbk01][Genesis 1]]
Six Days of Creation and the Sabbath
1When God began to create[[#hcp-nrsvuebib-0013.xhtml_fn01001001-1][a]] the heavens and the earth, 2the earth was complete
<<hcp-nrsvuebib-0011.xhtml_ch01002>>
The problem is that link to [[#ch002.xhtml#twosub][twosub]]
should be link to [[ch002.xhtml_twosub][twosub]]
. So remove the first #
and replace the internal #
with _
. Once I do that it works as expected
I did two naive find-replaces :%s/#hcp/hcp/g
and :%s/xhtml#/xhtml_/g
and that fixed some but not all.
Messiah,[[hcp-nrsvuebib-0137.xhtml_fn40001001-3][c]] the son of David,
is supposed to jump to [[hcp-nrsvuebib-0136.xhtml_rfn40001001-3][c]] 1.1 Or /Jesus Christ/
but doesn't. When I reconcile their differences, it still doesn't jump. I get this error output No match for fuzzy expression: hcp-nrsvuebib-0137.xhtml_fn40001001-3
I believe the issues here have been fixed by now (esp. the misplaced #
). Closing this issue as stale.
I believe the issues here have been fixed by now (esp. the misplaced
#
). Closing this issue as stale.
In pandoc 3.4 I'm still getting broken links from EPUBs. Converting to a single html i e.g. get links like
<p><a href="#part0100.html#c74" class="calibre1">74: GHOSTBLOOD</a></p>
that are all broken. (They are broken regardless of output format.)
(I also posted here https://github.com/jgm/pandoc/issues/6384#issuecomment-2366784449 )
@Enivex if you have a reproducible bug, please open a new issue with full information needed to reproduce it.
@Enivex if you have a reproducible bug, please open a new issue with full information needed to reproduce it.
Sure, though I can't upload this particular EPUB for legal reasons. I'll try to see if I can reproduce it from some freely available one.
First off, I love pandoc, and every time I read its documentation I learn new tricks it can do :D Thank you for all your hard work!
Explain the problem
I realize this may not be a pandoc issue, there are several pieces of software involved in going from DRMed epub file to org file, and any one of them may be causing this problem. But the reason I suspect pandoc may be causing the problem is that the epub file looks and works fine in Apple Books, Emacs, and Calibre, so I believe the input file is fine. The org file also looks fine, but does not work fine, ie clicking on links doesn't work. So it seems to me, perhaps naively, that pandoc isn't quite creating the org file correctly.
I may be missing a command line flag or something obvious, but I spent a couple of hours reading the docs and trying to figure it out so it isn't obvious to me 😅
What Happened
In Emacs, when pressing RET on a link, I get this error output
No match for custom ID: hcp-nrsvuebib-0010.xhtml#otpt
.What Did I Expect to Happen
I expected to jump to the link
Inputs
My input file is NRSVue, Holy Bible. If you need a copy to reproduce this let me know and I can provide. I used Calibre with DeACSM and DeDRM plugins to remove DRM.
Command Line Inputs
Below are various commands I used, all producing the same issue, copied and pasted from my terminal. I was grasping at straws to try to solve the problem, and read through nearly all of the man pages but didn't see anything that might help.
$ pandoc -s NRSVue,\ Holy\ Bible\ -\ Zondervan,.epub --from=epub --to=org --output=nrsvue.org
$ pandoc -s NRSVue,\ Holy\ Bible\ -\ Zondervan,.epub --from=epub --to=org --output=nrsvue.org --file-scope
$ pandoc -s NRSVue,\ Holy\ Bible\ -\ Zondervan,.epub --from=epub --to=org --output=nrsvue.org --file-scope --normalize
$ pandoc -s NRSVue,\ Holy\ Bible\ -\ Zondervan,.epub --from=epub --to=org --output=nrsvue.org --file-scope --extract-media=./
$ pandoc -s NRSVue,\ Holy\ Bible\ -\ Zondervan,.epub --from=epub --to=org --output=nrsvue.org --toc
$ pandoc -s NRSVue,\ Holy\ Bible\ -\ Zondervan,.epub --from=epub --to=org --output=nrsvue.org --reference-links
Minimal Output Example
Details
```org-mode [[#hcp-nrsvuebib-0005.xhtml#otbooks][Old Testament Table of Contents]] -------------- [[#hcp-nrsvuebib-0010.xhtml#otpt][OLD TESTAMENT]] -------------- [[#hcp-nrsvuebib-0011.xhtml#bk01][Genesis]] [[#hcp-nrsvuebib-0011.xhtml#ch01001][1]] | [[#hcp-nrsvuebib-0010.xhtml#ot][The Old Testament]] | [[#hcp-nrsvuebib-0011.xhtml#bk01][Genesis]] | [[#hcp-nrsvuebib-0011.xhtml#bk01][Gen]] | <Software versions