Open Enivex opened 1 month ago
--extract-media
requires an argument (file path). That is why.
I think I've fixed this issue. Note, however, that the EPUB you uploaded does not define an identifier troubleshooting
in installation.html. You can make this work by adding -f html+auto_identifiers
, but this seems like a bug in the EPUB.
I think I've fixed this issue. Note, however, that the EPUB you uploaded does not define an identifier
troubleshooting
in installation.html. You can make this work by adding-f html+auto_identifiers
, but this seems like a bug in the EPUB.
You're right. I didn't upload it myself, I just looked for one where I was getting similar errors. Turns out this one had broken links even in the epub.
I'll try the nightly release later.
Unfortunately the issue is not solved in the original file I was initially interested in. There are 237 missing label errors, even after trying to add auto_identifiers
The parts after _
corresponds to id-s in the htmls, but there's no corresponding labels being created in the typ file.
e.g. in part0111.html:
Edit: Version 3.4-nightly-2024-09-23
What do these anchors point to in the epub? You may need to unzip it and inspect the xhtml files contained therein, e.g. look in part0007.html and try to find the thing that has id pre
or ack1
.
What do these anchors point to in the epub? You may need to unzip it and inspect the xhtml files contained therein, e.g. look in part0007.html and try to find the thing that has id
pre
orack1
.
ack1
corresponds to another link back to the other one
pre
corresponds to a heading
If you can give me an epub to test with, it would really help, even if it's just stripped down to couple of these examples (all the better).
There is code that should be changing these ids. At least the pre
should work (on a heading). The identifier on the a href
might be ignored by the typst writer.
If you can give me an epub to test with, it would really help, even if it's just stripped down to couple of these examples (all the better).
I don't mind sending it to you for troubleshooting purposes, but I can't post it on github for obvious reasons.
Edit: Sent via email. Hopefully the large attachment doesn't cause issues.
There is code that should be changing these ids. At least the
pre
should work (on a heading). The identifier on thea href
might be ignored by the typst writer.
I just tested with latex output instead, and that has the same issue, so it's not only typst writer.
Most of the writers won't pay any attention to an identifier attribute on a Link element. (Try HTML.)
Most of the writers won't pay any attention to an identifier attribute on a Link element. (Try HTML.)
Converting to html does work, but that's not that surprising (since epub is html based)
that's not that surprising (since epub is html based)
Yes, but remember that pandoc isn't just moving the HTML from EPUB to the output. It is parsing everything into an intermediate data structure and re-rendering it. If it works with HTML, that shows that the identifier on the link does get parsed and represented in the AST. So the issue is simply that the Typst (and LaTeX) writer doesn't do anything with this attribute.
that's not that surprising (since epub is html based)
Yes, but remember that pandoc isn't just moving the HTML from EPUB to the output. It is parsing everything into an intermediate data structure and re-rendering it. If it works with HTML, that shows that the identifier on the link does get parsed and represented in the AST. So the issue is simply that the Typst (and LaTeX) writer doesn't do anything with this attribute.
That makes sense.
If you want to email me the epub, I can look into it further. At least the identifier on the heading should work.
If you want to email me the epub, I can look into it further. At least the identifier on the heading should work.
I did email you the epub, as I described above. Though it may have vanished into the void because of the large ( 11 MB) attachment.
ok, found it in junk folder.
OK, here's one example.
error: label `<part0111.html#pre>` does not exist in the document
┌─ twok.typ:328:47
│
328 │ = <part0007.html_page14><part0007.html_page15>#link(label("part0111.html#pre"))[#strong[PRELUDE TO \
So I look in part0111.html in the epub, and here's where the anchor is:
<p class="toc1" id="pre"><a href="part0007.html#pre" class="calibre1">Prelude to the Stormlight Archive</a></p>
Pandoc doesn't put attributes on Para elements, so this identifier was lost in the parsing stage.
The other cases I've looked at are like this. Links to headings, tables, figures, divs, and spans should work fine. Anything else pandoc is going to drop, but those are the lion's share of real uses.
Probably this can be closed.
For your immediate purposes, it might work to use a Lua filter to remove all internal links, so you don't get errors in typst.
Anything else pandoc is going to drop, but those are the lion's share of real uses.
Probably this can be closed.
Is there a particular reason why it can't keep them? In this case it completely breaks the TOC.
For your immediate purposes, it might work to use a Lua filter to remove all internal links, so you don't get errors in typst.
That's probably what I'm going to end up doing short term. (Having working links is useful for navigation though.)
A sensible TOC has links to identifiers on headings (e.g. h2 in HTML). These should work fine in a pandoc conversion. This particular document has links all over the place -- to p
elements, a
elements, etc.
Pandoc has no place to put an id attribute on p
, because its Para element has no slot of attributes.
Explain the problem. Take an epub file that uses internal links, e.g. https://dieterplex.github.io/rust-ebookshelf/The%20Rust%20Programming%20Language.epub
Run
pandoc -f epub -t typst '.\The Rust Programming Language.epub' --standalone -o 'trpl.typ'
. The exact options or even output file type are not very important.The resulting file includes links like
#link(label("ch01-01-installation.html#troubleshooting"))
(there are some other flavors too), which will not work, because the label it refers to does not exist in the document. The closest being<ch01-01-installation.html>
, which refers to the entire chapter.Pandoc version? What version of pandoc are you using, on what OS? (If it's not the latest release, please try with the latest release before reporting the issue.)
pandoc 3.4, Windows 11
A separate issue is that in order for images to work, the files have to be manually extracted from the(This particular issue was fixed by adding theepub
, and the placed correctly in relation the resultingtyp
file. (Not sure if I should create a separate issue for this)--extract-media .
option. Not entirely sure why the.
is required. Without it I getCouldn't extract ePub file: Did not find end of central directory signature
)