sergiocorreia / panflute

An Pythonic alternative to John MacFarlane's pandocfilters, with extra helper functions
http://scorreia.com/software/panflute/
BSD 3-Clause "New" or "Revised" License
500 stars 59 forks source link

doc.walk order of traversal #196

Closed rschram closed 2 years ago

rschram commented 3 years ago

What is the order of traversal of elements when one uses doc.walk()?

I have been playing around with panflute for the first time, and from what I can see it looks like doc.walk is a depth-first search with a specific ordering of the visited nodes that applies a function to the terminal nodes first. (I don't know enough graph theory to comment more specifically.) It seems like a walk of [a [b] [c [d]]] will apply a function to a, b, d, c. In my case, I would like to assign a unique id to several nested Spans in a Doc based on their location in the Doc tree, so I hoped that a walk would apply a function to each node that is visited in a depth-first search in the order they are visited, that is, parents before children.

According to this issue, Pandoc Lua filters do not traverse in a "linear order." Does this also apply to panflute's walk of a Doc? Is there a way around this?

Thanks for your help.

sergiocorreia commented 3 years ago

Hi Ryan,

This is kinda done on as depth-first but not exactly. In your example, panflute would do "b a d c". The reason is that it applies the functions to the child before doing so to the parent (there was a rationale for this that I can't recall).

Assuming you are running panflute in isolation (i.e. not using the autofilter options), then you can just:

  1. Look at the code for walk here
  2. What you need to do is create a new walk function where line 275 is moved to line 252, and then set panflute.walk=mywalk . Then your filter should run depth-first in the order you want.
rschram commented 3 years ago

Thanks for the advice. Due to my lack of knowledge of Python, I think, I was not able to follow your suggestion exactly, but I did move line 275 above line 252 in my local base.py (saving a copy of the original first of course). I ran my filter with this edited version of base.py and the walk of the document was strictly depth first. [a [b [c]] [d]] -> [a1 [b2 [c3]] [d4]]. Using .parent and .index I can now assign identifiers to a hierarchy of Div based on position within the tree. E.g. this is the output of the new filter :)

:: {mode="dd" label="test" id="0"}
Top level

::: {mode="fid" label="test" id="0_1"}
Second level

::: {mode="meta" label="test" id="0_1_1"}
This is the third level
:::

End of second level
:::

::: {mode="slip" label="test" id="0_2"}
Second level ii
:::

End of first level
:::

This is almost all the way to where I need to go (I would like to assign hierarchical identifiers to Div and Span used throughout a larger document to identify and annotate selected regions of text, so I will need to do something other than concatenating the parent's id attribute to the child.) Thanks again for this insight into how panflute walks through a tree.