jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.84k stars 3.39k forks source link

Org or Markdown to EPUB fails with ampersand in quoted title string #10410

Closed gregoryg closed 3 days ago

gregoryg commented 3 days ago

OS and Pandoc version

Pandoc

pandoc 3.5
Features: +server +lua
Scripting engine: Lua 5.4
User data directory: /home/gregj/.local/share/pandoc

OS

NAME="Debian GNU/Linux"
VERSION_CODENAME=trixie

Problem

Pandoc conversion from either Markdown or Org Mode to EPUB fails when using a YAML metadata file containing an ampersand in the title, even when the title value is quoted.

I have not found this error in open/closed issues or PRs or discussed on the wiki

Files used

simple.org

#+title: Improve Learning Using Neuroscience & AI
* Headline 1
  here is some text

simple.md

# Headline 1
here is some text

metadata.yaml

---
title:
  - type: main
    text: "Improve Learning Using Neuroscience & AI"
creator:
  - role: author
    text: Andrew Huberman
publisher: YouTube
date: 20241118
lang: en
belongs-to-collection: YouTube

commands run to reproduce

pandoc -f markdown -t epub --epub-metadata=metadata.yaml > simple.epub simple.md
pandoc -f org -t epub --epub-metadata=metadata.yaml > simple.epub simple.org

Pandoc error in both cases

Invalid XML in epub metadata:
ParseError {errorContexts = ["text content"], errorMessage = "Failed reading: parseTextContent", errorPosition = 4:48 (82)}
jgm commented 3 days ago

--epub-metadata takes a special kind of XML file, not a YAML file. If you want to use YAML metadata this way, you can use --metadata-file.

jgm commented 3 days ago

From the docs:

--epub-metadata=FILE

Look in the specified XML file for metadata for the EPUB. The file should contain a series of Dublin Core elements. For example:

 <dc:rights>Creative Commons</dc:rights>
 <dc:language>es-AR</dc:language>