jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34k stars 3.35k forks source link

pandoc html to asciidoc conversion of <a title target> tags does not work as expected #10087

Open shinguz opened 1 month ago

shinguz commented 1 month ago

Discussed in https://github.com/jgm/pandoc/discussions/10084

Originally posted by **shinguz** August 13, 2024 Hello everyone I am currently migrating our CMS from Drupal to Hugo. One step in the automation is the conversion from HTML to asciidoc. I tried to use pandoc for this. But unfortunately pandoc does it "wrong": [Edited to fix code formatting] ``` echo '

Example

' | pandoc --wrap=none -f html -t asciidoc ``` The result is the following: ``` [https://www.example.com[](https://www.example.com/)Example] ``` I would have hoped for something like this: ``` [https://www.example.com[](https://www.example.com/)Example^] [https://www.example.com[](https://www.example.com/)Example, title="Example title", window="blank"] ``` I don't feel like fixing the 100s of links by hand. Is it possible to adjust this in pandoc (configuration, plug-in) or is it not yet implemented (or a bug)? Or do I have a too old version (v2.9.2.1)? Do any of you know more about this? I haven't found anything quickly, but I don't know exactly where to look... Many thanks to you all for a hint!
alerque commented 1 month ago

This user is having trouble copy pasting HTML and AsciiDoc into GitHub's markdown with no escaping. This issue is supposed to be a feature request for title attributes to carry over to the AsciiDoc writer. My comment in #10084 has links to the docs for that.

jgm commented 1 month ago

OK. So, here is present behavior:

 % pandoc -f html -t asciidoc
<p><a href="https://www.example.com" target="_blank" title="Example title">Example</a></p>
^D
https://www.example.com[Example]

Reproducing the link here: https://docs.asciidoctor.org/asciidoc/latest/macros/link-macro-attribute-parsing/

I think these are features that were added to asciidoc more recently. Actually, you can see that it's quite complex; we must also give special handling to any case where the link text contains a comma or equal sign!

So, we need to do two things: