udapi / udapi-python

Python framework for processing Universal Dependencies data
GNU General Public License v3.0
55 stars 30 forks source link

how to use REGEX in queries like... #91

Closed arademaker closed 3 years ago

arademaker commented 3 years ago
% awk '$2 ~ /^[A-Z]\.$/' *.conllu

maybe something like ~= below?

% cat documents/*.conllu | udapy -q util.Eval node='if (node.form ~= "^[A-Z]\.$"): node.draw(attributes="form,lemma,upos,feats")
martinpopel commented 3 years ago

This is Python, so just use re.match("[A-Z]\.$", node.form) instead of (node.form ~= "^[A-Z]\.$"). Blocks util.Eval and util.Mark already include import re. Note that the initial ^ is not needed in re.match (unlike re.search).

BTW: util.Mark may be also useful here:

cat documents/*.conllu | udapy -qTMA util.Mark node='re.match("[A-Z]\.$", node.form)' | less -R