sergiocorreia / panflute

An Pythonic alternative to John MacFarlane's pandocfilters, with extra helper functions
http://scorreia.com/software/panflute/
BSD 3-Clause "New" or "Revised" License
500 stars 59 forks source link

Newlines appearing in Divs #106

Closed mpjuers closed 4 years ago

mpjuers commented 5 years ago

I have the following code:

<div data-label="286335">A scatterplot by By DanielPenfield - Own work, CC BY-SA 3.0, <a href="https://commons.wikimedia.org/w/index.php?curid=9402369">https://commons.wikimedia.org/w/index.php?curid=9402369</a>.</div>

and this filter

def labels(elem, doc, replacements):
    if (       isinstance(elem, pf.Header)
            or isinstance(elem, pf.Div)):
        try:
            if 'label' in elem.attributes.keys():
                elem.identifier = elem.attributes['label']
                if isinstance(elem, pf.Header):
                    return elem
                else:
                    return pf.Div(
                        pf.RawBlock(
                            ''.join(
                                [r'\caption{\hypertarget{',
                                 elem.attributes['label'],
                                 r'}{\label{',
                                 elem.attributes['label'],
                                 r'}}%']
                            ),
                            format='latex'
                        ),
                        elem.content[0],
                        pf.RawBlock(r'}', format='latex'),
                    )
        except AttributeError:
            pass
        # Extra section headers with not content can cause an IndexError.
        except IndexError:
            pass
        except KeyError:
            pass

When I run this filter on my input I get

\caption{\hypertarget{286335}{\label{286335}}%

A scatterplot by By DanielPenfield - Own work, CC BY-SA 3.0,
\href{https://commons.wikimedia.org/w/index.php?curid=9402369}{https://com
mons.wikimedia.org/w/index.php?curid=9402369}.

}

The newlines between the elements of Div prevent the document from compiling correctly. How can I get rid of them?

sergiocorreia commented 5 years ago

Pandoc inserts empty lines between blocks. In your case, the Div you are returning contains three items: a RawBlock, then a Plain, and finally another RawBlock.

The solution would be to add the two latex items as RawInlines instead of RawBlocks, and do so directly into the Plain object.

I cleaned your example due to some bugs ("label" instead of "data-label", a replacements argument) and to simplify it. A workable example is below:

import panflute as pf

def labels(elem, doc):
    if isinstance(elem, pf.Div) and 'data-label' in elem.attributes:
        elem.identifier = label = elem.attributes['data-label']
        mytext = r'{\caption{\hypertarget{' + label + r'}{\label{' + label + r'}}%'
        first_child = elem.content[0]
        first_child.content.insert(0, pf.RawInline(mytext, format='latex'))
        first_child.content.append(pf.RawInline(r'}', format='latex'))
        return pf.Div(first_child)

def main(doc=None):
    return pf.run_filter(labels, doc=doc)

if __name__ == "__main__":
    main()

Output:

{\caption{\hypertarget{286335}{\label{286335}}%A scatterplot by By
DanielPenfield - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=9402369.}

Reflecting a bit more, the filter would be simple if we could treat the element as a list, in which case its contents get modified. Might be something to do in the next update.

mpjuers commented 5 years ago

This mostly works for me, although using data-label does not seem to. I changed it back to label and it worked fine. Thanks!