opencog / relex

English Dependency Relationship Extractor
http://wiki.opencog.org/w/RelEx
Apache License 2.0
85 stars 69 forks source link

Incorrect creation of _det dependency relation #230

Open williampma opened 8 years ago

williampma commented 8 years ago

_det is triggered incorrectly.

Link Grammar parse diagram:

    +---------------Xp--------------+       
    +-------->WV-------->+          |       
    +-----Wd-----+       |          |       
    |      +Ds**v+---Ss--+-Paf-+    +--RW--+
    |      |     |       |     |    |      |
LEFT-WALL an Italian.s is.v crazy.a . RIGHT-WALL 

Phrase Structure parse:

    (S (NP an Italian.s) (VP is.v (ADJP crazy.a)) .)

Dependency relations:

    _det(Italian, an)
    _predadj(Italian, crazy)

Attributes:

    penn-POS(be, VBZ)
    subscript-TAG(be, .v)
    pos(be, verb)
    tense(crazy, present)
    penn-POS(crazy, JJ)
    subscript-TAG(crazy, .a)
    pos(crazy, adj)
    definite-FLAG(Italian, T)
    penn-POS(Italian, NN)
    subscript-TAG(Italian, .s)
    pos(Italian, noun)
    noun_number(Italian, singular)
    penn-POS(an, DT)
    pos(an, det)
    pos(., punctuation)

A bit related to https://github.com/opencog/relex/issues/211 and https://github.com/opencog/relex/issues/212 no doubt.

linas commented 8 years ago

? what's the problem? the output looks right to me ... what were you expecting?

williampma commented 8 years ago

Expecting no _det(Italian, an).

_det is not generated for sentences like "An orange is tasty.". So it's something different for this sentence An Italian is crazy in particular.

R2L uses _det only for "those", "these", "this", "that". Unless R2L is also wrong.

linas commented 8 years ago

_det should be generated for "An orange is tasty." and any other sentence with a, an, the, this, these, those ... Not sure what R2L has to do with this?

williampma commented 8 years ago

Just that one of the R2L rule accept the _det relation.

If _det is supposed to be for a, an, the as well, then the R2L is bugged.

linas commented 8 years ago

!? Well, a determiner is a determiner. For dis-embodied text, determiners offer little information, and its possible that relex tried to avoid generating them. However, for embodied chat, being able to understand determiners is critical.

BTW, it turns out that depressed people use determiners much more often, its even a sign of suicidality: instead of saying "the book is on top of the table", they will say "this is on top of that", as if unable to fully delineate the objects being talked about.

amebel commented 8 years ago

why put the effort if one is planning to die? that is rational :laughing:

anitzkin commented 8 years ago

Lol. Perhaps it reflects a lackof caring about specifics? I mean if you're depressed, who cares what's on top of what? It's all the same?

About _det . . .. as I recall, and I am not sure I recall correctly, when I found it, it was being used only for this, that, those, and these, which of course really annoyed me, because not only are those not the only determiners, but they are also demonstratives, which require special treatment; they call on an R2L scheme helper which creates a variable to express the fact that they should induce the system to identify "which" referent they are pointing at. Which is all well and good, but not consistently integrated with the treatment of other determiners, which as far as I know only trigger the tag "indefinite" or "definite" to be attached to the node for the noun in question.

I believe (but I'm not sure I remember correctly) that I partially revised that aspect of relex / R2L, and then we rolled back the revisions because I had broken the treatment of "this" "that" etc. and then the whole issue was left hanging. Probably someone needs to proceed with generalizing _det for all determiners, and creating an additional relation for "demonstrativeness."

anitzkin commented 8 years ago

Sorry, if that wasn't clear; yes, R2Lis bugged. Because the only R2L _det rule, if things haven't changed, is the one that assigns variables to demonstrative determiner phrases.

I'm going to see if I can get a VMware opencog running on my laptop so I can do some relex / r2l work, as I am still uneployed at the moment :-) . . . .

linas commented 8 years ago

ahh, yes right ... if I recall correctly, a fuller support for _det is in the stanford compatibility files, and yes, originally, LG just stuck a definite/indefinite tag on the nouns. So to get stronger support for _det, you'd need to move it over from the stanford-compat files.