rug-compling / dact

Decaffeinated Alpino Corpus Tool
https://rug-compling.github.io/dact/
GNU Lesser General Public License v2.1
13 stars 2 forks source link

parameterized macros #111

Open gertjanvannoord opened 12 years ago

gertjanvannoord commented 12 years ago

Consider these macros which will find all words that can occur as the OBJ1 of the verb DRINKEN:

obj1_drinken_lexical = """ ( @rel="obj1" and @word and ../node[@rel="hd" and @lemma="drinken"] )"""

obj1_drinken_phrase = """ ( @rel="hd" and ../@rel="obj1" and ../../node[@rel="hd" and @lemma="drinken"] )"""

obj1_drinken_lexical_nonlocal = """ ( (@cat or @word) and %i% = //node[@rel="obj1" and ../node[@rel="hd" and @lemma="drinken"]]/%i% )"""

obj1_drinken_phrase_nonlocal = """ ( @rel="hd" and ../%i% = //node[@rel="obj1" and ../node[@rel="hd" and @lemma="drinken"]]/%i% )"""

obj1_drinken = """ ( %obj1_drinken_lexical% or %obj1_drinken_phrase% or %obj1_drinken_lexical_nonlocal% or %obj1_drinken_phrase_nonlocal% ) """

If we want to do the same thing for "eten", we need another page of macros. I want to be able to say

%dependent("obj1","drinken")%

and then define

dependent(Rel,Head) = """ ( dependent_lexical(%Rel%,%Head%) or dependent_phrase(%Rel%,%Head%) or ....

etc

"""

danieldk commented 12 years ago

I think we need to parse to a proper AST to be able to do this. Since playing with and testing ASTs in C++ is a real real drag (really). I did a quick write-up in Haskell of a proposed AST format, and how variable substitution and macro calls could be applied:

https://gist.github.com/3742284

@jelmervdl any comments?

danieldk commented 12 years ago

BTW, a sample invocation:

% ghci -Wall -XOverloadedStrings test.hs 
*Main> applyMacro testMacros testMacro2 ["20"]
Just [StringChunk "bar",StringChunk "foo",StringChunk "20"]
*Main> callMacro testMacros testMacro2 ["20"]
Just "barfoo20"
jelmervdl commented 12 years ago

Looks ok to me. I don't really have anything to add.

danieldk commented 12 years ago

I thought a bit more about this, there are some annoyances in implementing this:

One solution, that simplifies things a bit is to move macro invocations out of the query using string interpolation a la Python. E.g.:

a(x) = """//node[%s and %s]""" % b(x), c(x)

Another solution is string concatenation:

a(x) = """//node[""" + b(x) + """ and """ + c(x) + """]""""

Though, I am still wondering if there is no off-the-shell solution that we can use...