udapi / udapi-python

Python framework for processing Universal Dependencies data
GNU General Public License v3.0
57 stars 31 forks source link

export fragments of trees from a query to LaTeX #93

Closed arademaker closed 3 years ago

arademaker commented 3 years ago

Hi @martinpopel , I am trying to find examples in the corpus and copy tree fragments to an article. I know that I can do that with udapi, but I could not found in the tutorial or in the docs. Can you help me? what should be the command line call to list the fragments in LaTeX and what pieces of LaTeX and packages I would need to make it work in the LaTeX document?

This is the kind of command I used to show the fragments in the text format:

cat documents/*.conllu | udapy -q util.Eval node='if node.deprel == "nsubj" and node.parent.upos == "ADJ" and (node.feats["Gender"] != node.parent.feats["Gender"] or node.feats["Number"] != node.parent.feats["Number"]) : node.draw()'

See https://github.com/UniversalDependencies/UD_Portuguese-Bosque/issues/314

Once I found a good example, I would like to produce a LaTeX from the fragment...

martinpopel commented 3 years ago

First, you can use util.Mark instead of util.Eval and udapy -TM (see udapy -h for help) instead of node.draw(). This way, the subject node will be marked (highlighted):

cat documents/*.conllu | udapy -TM util.Mark node='node.deprel == "nsubj" and node.parent.upos == "ADJ" and (node.feats["Gender"] != node.parent.feats["Gender"] or node.feats["Number"] != node.parent.feats["Number"])' | less -R

If you like this kind of TextModeTrees visualization, you can include it in LaTeX \begin{verbatim}...\end{verbatim} as described here https://github.com/udapi/udapi-python/blob/36834/udapi/block/write/textmodetrees.py#L130-L136 You will need to exclude the colors (udapy -TMN ... > sample.tex). It would be nice to have write.TextModeTreesLatex (similarly to write.TextModeTreesHtml), which would support the colors and highlighting (PRs are welcome).

If you prefer tikz-dependency visualization, you can use write.Tikz. However, unlike write.TextModeTrees this writer block does not support marked_only=1. So you have to filter the trees you are interested in using util.Filter:

cat documents/*.conllu | udapy \
 util.Filter keep_tree_if_node='node.deprel == "nsubj" and node.parent.upos == "ADJ" and (node.feats["Gender"] != node.parent.feats["Gender"] or node.feats["Number"] != node.parent.feats["Number"])' \
 write.Tikz > sample.tex
pdflatex sample.tex
martinpopel commented 3 years ago

Oh, and if you prefer to print only the fragments instead of whole trees, you can use keep_subtree instead of keep_tree_if_node.