sergiocorreia / panflute

An Pythonic alternative to John MacFarlane's pandocfilters, with extra helper functions
http://scorreia.com/software/panflute/
BSD 3-Clause "New" or "Revised" License
500 stars 59 forks source link

Convert `panflute.convert_text` output to html #191

Closed perrette closed 3 years ago

perrette commented 3 years ago

What is the best way to write html from parsed panflute.convert_text output?

Use case: conversion from markdown to html, with a custom HTML template for images. I need to inject the caption into the template and return a pf.RawInline instance of the styled image. The caption was previously parsed and other filters applied (links, citeproc ...). Now if only I could convert it back to HTML, the problem would be solved.

I started to use the walk method, but it is not a straightforward task, and it duplicates the work of pandoc. I was just wondering, how can I make use of the underlying pandoc / panflute machinery to convert to HTML and avoid double work?

Alternatively, I guess I could try to parse the tempate itself, using PLACEHOLDERS, and then replace the placeholder...but again it seems too complicated. I'm sure I'm missing a simple solution.

Thanks much for any suggestion.

ickc commented 3 years ago

I don’t quite follow your use case, could you explain it in more details step by step? Eg the crucial part is what you want to do with the template.

convert_text can convert between any formats in addition to the ast. So your title sentence is doable. But your use case description is more complicated than that.

perrette commented 3 years ago

Ok. I see i keep underestimating convert_text. When I tried to use it I had run into errors. I'll give it another try.

My use case is, convert [caption text](path/to/image.png){ title="image title" source="image source" } into something like

<h2> image title </h2>
<p> image caption <span class=...>image source</span></p>
</div>

with a few more elements and keywords. The template is rendered via jinja2, as

jinja2.Template("""<div>
<h2> {{title}} </h2>
<p> {{caption}} <span class=...> {{source}} </span> </p>
</div>""").render(title=title, caption=caption, source=source)

that's why i need to convert caption to html directly. I am aware that I could build such a template with native panflute classes but I'd prefer to stick to jinja2 for readability. Caption and source potentially have links and bibtex references.

The pandoc call involves three filters:

  1. Move source keyword to caption (that involves a first call of convert_text on raw "source" field, since caption is ast and source is markdown).
  2. Call citeproc (and other standard filters) to process references in the caption.
  3. Recover figure caption and other keywords to render the template as explained above, and returns a RawInline element. I need to convert caption from ast to html to achieve this.

I hope that is a clearer explanation. There are probably other ways of achieving this, but that seems to work for me. I'm only stuck in extracting figure caption in HTML format. I understand now that should try again with convert_text.

perrette commented 3 years ago

I could solve the issue by converting my image element to Para:

pf.convert_text(pf.Para(*elem.content), 'panflute', 'html')

Thanks for your hint and the great project.