code for programmatically get the senses for each predicate

arademaker commented 3 years ago

The code in the README file shows how to print a specific reading of a sentence. But I want to go further in the use of the DMRS augmented with senses.. How can I get the predicates and the senses? ISF has a wrapper to the https://pydelphin.readthedocs.io/en/latest/ classes, right? In what objects the senses are attached? Can you expand the code below to show how to iterate over the predicates of a reading to get its associated senses?

from coolisf import GrammarHub
ghub = GrammarHub()
# parse an English text
sent = ghub.ERG_ISF.parse("I love drip coffee.")
# print semantic structures for all potential readings
for reading in sent:
    print(reading.dmrs())

A complementary question is if the initial predicates are available besides the ones produced by ISF for the MWE. For example, can I still get the green and tea beside the green+tea?

arademaker commented 3 years ago

Hi @letuananh , any idea here?

letuananh commented 3 years ago

@arademaker The ISF has its own DMRS model but does contains functions to make integration with PyDelphin possible. I'll try to improve the documentation asap but for now you can refer to the code here: https://github.com/letuananh/intsem.fx/blob/4a59ea40e05d686ae5d077eff22ff1248d61c1e8/coolisf/model.py#L558

There are two different parse() function, the grammar's parse() and the ghub.parse(). The former will tag sentences automatically if a tagger is set while the latter will only generate the mrses and nothing more. If you want to sense-tag the sentences, you can use either of these:

Using sent.tag_xml() function

from coolisf import GrammarHub

ghub = GrammarHub()

sent = ghub.ERG_ISF.parse("I love drip coffee.")
sent.tag_xml(method="lelesk")  # tag the predicates using LeLesk
for idx, reading in enumerate(sent, start=1):
    print(f"Reading #{idx}")
    if reading.dmrs().tags:
        for nodeid, tags in reading.dmrs().tags.items():
            tag_str = ', '.join('{}[{}/{}]'.format(s.ID, s.lemma, m) for s, m in tags)
            print("# {} -> {}".format(nodeid, tag_str))

Using ghub.parse() function

from coolisf import GrammarHub
from lelesk import LeLeskWSD

wsd = LeLeskWSD()
ghub = GrammarHub()

sent = ghub.parse("I love drip coffee.", grm="ERG_ISF", tagger="lelesk", wsd=wsd)
for reading in sent:
    print(reading.dmrs())
    if reading.dmrs().tags:
        for nodeid, tags in reading.dmrs().tags.items():
            tag_str = ', '.join('{}[{}/{}]'.format(s.ID, s.lemma, m) for s, m in tags)
            print("# {} -> {}".format(nodeid, tag_str))

Note: When we call python -m coolisf text "I like drip coffee." in the terminal, it invokes this function https://github.com/letuananh/intsem.fx/blob/4a59ea40e05d686ae5d077eff22ff1248d61c1e8/coolisf/main.py#L222 Lines 237 -- 240 show how to get the synsets and the predicate links out. You may refer to it for more information on how the ISF is used.

I hope this helps. Please let me know if you need anything else. Have a nice day :)

letuananh / intsem.fx

code for programmatically get the senses for each predicate #22