Closed amine-aboufirass closed 2 years ago
Ok, so I built a simple script to try to do this. Here's ref.md
:
- item
- subitem
- item
Here's write.py
:
import panflute as pf
def action(elem, doc):
if isinstance(elem, pf.elements.BulletList):
print(elem)
if __name__ == "__main__":
with open("ref.md") as fs:
markdown = fs.read()
doc = pf.convert_text(markdown, standalone=True)
doc.walk(action)
If I do python write.py
I get the following output:
BulletList(ListItem(Plain(Str(subitem))))
BulletList(ListItem(Plain(Str(item)) BulletList(ListItem(Plain(Str(subitem))))) ListItem(Plain(Str(item))))
Basically, doc.walk
goes through the deepest items in the list before reaching the shallowest items. So I'm not sure how I can use panflute
to achieve what I want.
I've come across this with "classic style" writers in Lua and there is a thread in the pandoc mailing list about this issue. So I end up with the following (more specific) questions:
panflute
to detect elements in the AST and type stuff out to a terminal in the format I like (LaTeX or whatever). That's essentially what a pandoc
writer is. However, if I do it as shown in the above example, I can't apply pandoc
templates to it. Or can I?Looks like this has also been discussed here https://github.com/sergiocorreia/panflute/issues/84
Though rather inconclusively, in my opinion....
Something like this should work:
import panflute as pf
def action(elem, doc):
if isinstance(elem, pf.BulletList):
# Instead of pf.stringify(item) you can also do item.content[0].content[0].text
text = '\n'.join(pf.stringify(item) for item in elem.content)
text = text.split('\n')
text = ''.join('\n ' + row for row in text)
text = '\n\\begin{itemize}' + text + '\n\\end{itemize}'
return pf.CodeBlock(text)
elif isinstance(elem, pf.ListItem) and isinstance(elem.parent, pf.BulletList):
text = r'\item ' + pf.stringify(elem)
return pf.ListItem(pf.Plain(pf.Str(text)))
if __name__ == "__main__":
with open("ref.md") as fs:
markdown = fs.read()
doc = pf.convert_text(markdown, standalone=True)
doc.walk(action)
print(doc.content[0].text)
Output:
\begin{itemize}
\item item1
\begin{itemize}
\item subitem1
\end{itemize}
\item item2
\end{itemize}
Basically, this filter would create a code block (it can be anything really) that stores the formatted text. It's a bit more cumbersome that what I would have liked, but if you don't care much about maintaining indentation as you go deeper into the nesting, then you can simplify the join/split lines.
Also, note that here we exploit the fact that we go depth-first, as we first format the more nested items. You could also create more customized walkers that just go shallow-first and thus simplify the filter code.
@sergiocorreia thanks for your response. That works.
The end result, however, is still wrapped in a CodeBlock
. This is a problem for me because I'd like to use panflute
in conjunction with a template which I already have defined:
\documentclass[a4paper]{article}
\usepackage{cite}
\usepackage[nonumberlist]{glossaries}
\usepackage{hyperref}
\usepackage[margin=2cm]{geometry}
\usepackage{graphicx}
\usepackage{array}
\usepackage{mfirstuc}
\usepackage[official]{eurosym}
\makeglossaries
\graphicspath{{./images/}}
\newglossaryentry{LabView}
{
name={LabView},
description={
system-design platorm and development environment for associated visual
programming language%
}
}
\newglossaryentry{VI}
{
name={VI},
description={Virtual Instrument}
}
\begin{document}
\tableofcontents
\clearpage
$body$
\clearpage
\bibliography{bibliography}
\bibliographystyle{abbrv}
\clearpage
\printglossaries
\end{document}
So the result of whatever gets processed by panflute
is dumped into the placeholder $body$
. I rewrote your code in the panflute
filter format:
import panflute as pf
def action(elem, doc):
if isinstance(elem, pf.elements.BulletList):
text = '\n'.join(pf.stringify(item) for item in elem.content)
text = text.split('\n')
text = ''.join('\n ' + row for row in text)
text = '\n\\begin{itemize}' + text + '\n\\end{itemize}'
return pf.CodeBlock(text)
elif isinstance(elem, pf.ListItem) and isinstance(elem.parent, pf.BulletList):
text = r'\item ' + pf.stringify(elem)
return pf.ListItem(pf.Plain(pf.Str(text)))
def main(doc=None):
return pf.run_filter(action, doc = doc)
if __name__ == "__main__":
main()
Using the above, I ran the following command:
pandoc -F write.py --template custom_template.latex ref.md
Which yielded the following output:
\documentclass[a4paper]{article}
\usepackage{luacode}
\usepackage{cite}
\usepackage[nonumberlist]{glossaries}
\usepackage{hyperref}
\usepackage[margin=2cm]{geometry}
\usepackage{graphicx}
\usepackage{array}
\usepackage{mfirstuc}
\usepackage[official]{eurosym}
\usepackage{luacode}
\makeglossaries
\graphicspath{{./images/}}
\newglossaryentry{LabView}
{
name={LabView},
description={
system-design platorm and development environment for associated visual
programming language%
}
}
\newglossaryentry{VI}
{
name={VI},
description={Virtual Instrument}
}
\begin{document}
\tableofcontents
\clearpage
<pre><code>
\begin{itemize}
\item item
\begin{itemize}
\item subitem
\end{itemize}
\item item
\end{itemize}</code></pre>
\clearpage
\bibliography{bibliography}
\bibliographystyle{abbrv}
\clearpage
\printglossaries
\end{document}
As you can see the content is added where it needs to be (i.e. $body$
, but the <pre>
tag is still there, which makes sense because we are wrapping stuff in pf.CodeBlock
in the script you proposed.
So technically, pandoc
is still writing to html (the default), because panflute
acts as a filter and not a writer. I'd like to circumvent the pandoc
writer and dump straight to my template. Is there some sort of workaround?
Maybe replacing CodeBlock with something else would work? EG using "Plain"?
Yes, thanks! But you do have to wrap the text in an Str
object first. This is what worked for me:
import panflute as pf
def action(elem, doc):
if isinstance(elem, pf.elements.BulletList):
text = '\n'.join(pf.stringify(item) for item in elem.content)
text = text.split('\n')
text = ''.join('\n ' + row for row in text)
text = '\n\\begin{itemize}' + text + '\n\\end{itemize}'
return pf.Plain(pf.Str(text))
elif isinstance(elem, pf.ListItem) and isinstance(elem.parent, pf.BulletList):
text = r'\item ' + pf.stringify(elem)
return pf.ListItem(pf.Plain(pf.Str(text)))
def main(doc=None):
return pf.run_filter(action, doc = doc)
if __name__ == "__main__":
main()
Let me just add that the pandoc command must be adjusted after this, otherwise the default tex writer in pandoc will kick in, which we don't want in this case. The option "plain" will forcibly disable the behavior:
pandoc -F write.py -t plain --template custom_template.latex -o test.tex test.md
Just to explain what happened in your first try above: the <code>
and <pre>
tags appear because pandoc -F write.py --template custom_template.latex ref.md
does not specify an output format (so it defaults to HTML) and you wrap your LaTeX code in a CodeBlock, which is like a listing in LaTeX. You should get the desired output by replacing return pf.CodeBlock(text)
with return pf.RawBlock(text, 'latex')
(or format='latex'
?) and by specifying on the command line either -t latex
or an output file ending with .tex
.
Is it possible to trick
panflute
into acting more like a pandoc writer than a pandoc filter? In particular, I am interested in parsing the AST to go from something like this in Markdown:To something like this in LaTeX:
I know there are already built-in writers for this in
pandoc
, but I'm very much interested in building my own.How do I go about doing this and where can I start?