opencog / relex

English Dependency Relationship Extractor
http://wiki.opencog.org/w/RelEx
Apache License 2.0
85 stars 69 forks source link

R2L ForAllRule #88

Closed sebastianruder closed 10 years ago

sebastianruder commented 10 years ago

Hi all, I'm not quite clear on the semantics of the R2L ForAllRule. For the sentence "All Canadians are right-handed", it produces:

(ForAllLink (stv 0.990000 0.990000)
  (ConceptNode "Canadians@53ae4de2-ff7b-4f46-b47b-9dcc04b1f24c") ; [2270]
  (InheritanceLink (stv 0.990000 0.990000)
    (ConceptNode "Canadians@53ae4de2-ff7b-4f46-b47b-9dcc04b1f24c") ; [2270]
    (ConceptNode "Canadian") ; [2271]
  ) ; [2272]
) ; [2274]

For me, there is not much of a difference between this and

(InheritanceLink (stv 0.990000 0.990000)
  (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248") ; [2362]
  (ConceptNode "Canadian") ; [2363]
) ; [2364]

which is produced without the ForAllRule. Both signify for me that every instance of "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248" inherits from "Canadian". What is the added value of the ForAllLink? IMO, the following representation would make more sense:

(ForAllLink
    (VariableNode "$X")
    (ImplicationLink
        (InheritanceLink
            (VariableNode "$X")
            (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248"))
        (InheritanceLink
            (VariableNode "$X")
            (ConceptNode "right-handed"))))
linas commented 10 years ago

I think you want to say "every member of the class "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248"

The @abc123 was really meant to be a word-instance tag, so that if the next sentence says "they also say 'eh' a lot", we can figure out that "they" refers to canadians.

These rules should probably generate a different markup for different concepts and word-senses, so for example (ConceptNode "Canadians/43") -- observe the slash -- so that we can distinguish right-handed canadians that live in canada and say 'eh' a lot, from the let-handed canadians who live on california and surf a lot, who are all (ConceptNode "Canadians/42")

By re-using the word-instance UUID's for word senses, I think we are creating a lot of confusion, now and in the future. I mean, theoretically, its OK to re-use the UUID's but I worry that its will make things confusing ...

Also: this looks not exactly right:

(InheritanceLink (stv 0.990000 0.990000)
  (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248") ; [2362]
  (ConceptNode "Canadian") ; [2363]
)

it should be something like:

(InheritanceLink
    (ConceptNode "Canadians/43")
    (ConceptNode "Canadians/65")
)

where Canadians/65 is the set of all canadians living anywhere and everywhere in the world, independent of their handedness.

bgoertzel commented 10 years ago

For me, the following representation would make more sense:

(ForAllLink (VariableNode "$X") (ImplicationLink (InheritanceLink (VariableNode "$X") (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248")) (InheritanceLink (VariableNode "$X") (ConceptNode "right-handed"))))

— Reply to this email directly or view it on GitHub https://github.com/opencog/relex/issues/88.

I think that

(ForAllLink (VariableNode "$X") (ImplicationLink (InheritanceLink (VariableNode "$X") (ConceptNode "Canadians")) (InheritanceLink (VariableNode "$X") (ConceptNode "right-handed"))))

would make more sense as an interpetation of "All Canadians are right-handed"

Or one could make a specialized version

(ForAllLink (VariableNode "$X") (ImplicationLink (InheritanceLink (VariableNode "$X") (ConceptNode "Canadians@123")) (InheritanceLink (VariableNode "$X") (ConceptNode "right-handed@456"))))

which means

"All Canadians (in the specific sense meant in this sentence) are right-handed (in the specific sense meant in this sentence)"

The specialized version causes less work for the system in cases like

"All Canadians are right handed. And by that I mean, they all have right hands. Canadians NEVER amputate their citizen's right hands."

On the other hand, the specialized version is going to be pretty useless most of the time....

Given the specialized version PLN should be able to produce the general version from

InheritanceLink canadian@123 canadian

Inheritancelink right-handed@456 right-handed

However, I think it makes sense for R2L to simply produce the general version automatically, because it's almost always going to be useful.

So as in other cases my suggestion is to have R2L generate both the specialized and general version, but assign the specialized version a lower LTI, so that the specialized version can get forgotten fairly quickly if there turns out no use for it...

However, all these comments are just about specialization vs. generalization of ConceptNodes. Your main point seemed to be about the need to use ForAllLink, and there I think you are totally correct...

-- Ben

-- Ben

Ben Goertzel, PhD http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our minds" -- Robert Nesta Marley

sebastianruder commented 10 years ago

Hi @bgoertzel and @linas, thank you for your feedback. I understand the points you make concerning disambiguation and generalization. I will keep these in mind and adjust the output accordingly. From your replies I infer that you agree with me in that the ForAllRule needs to be rewritten to capture the relationship more appropriately. At the moment, I'm just unsure on how to establish the second argument of the ImplicationLink using Scheme so that it captures all possible sentences. I want to leverage what previous rules already have created. I guess I could use cog-incoming-set to retrieve atoms involving the noun that have already been created, but I'm not sure on how to decide which one should be used in the implication. Do you have any suggestions?

linas commented 10 years ago

Hey Ben,

I'm starting to think that we should be generating word-senses directly as part of the early r2l processing. Working with word-senses would give us a simple middle-ground between "canadian@123" (this particular canadian) and ConceptNode canadian (the universe of all possible canadians). With this in mind, I added a section to the wiki page: http://wiki.opencog.org/w/Linguistic_Interpretation#The_Mihalcea_algo

Please review, I think you'll like it. It gives a hand-wavey description of how PLN fits in: PLN is used to help select/reinforce the most likely interpretation.

In the above, the initial choice of word-senses are taken from WordNet. We will also need to invent a mechanism to create new word senses as needed (e.g. maybe WordSenseNode "canadian@123" -- "this particular canadian")

I'm somewhat confused about how WordSenseNodes and ConceptNodes interact ...

linas commented 10 years ago

Hi @sebastianruder I don't understand your question. The file nllp/scm/nlp-utils.scm contains handy utilities for locating certain atoms given other certain atoms. These wrap up cog-incoming-set into handy-dandy bite-size chunks.

bgoertzel commented 10 years ago

Interesting...

Yes, of course doing Mihalcea style WSD along with R2L processing would be great....

Using PLN as you describe makes perfect sense, yet would of course be a bit of a project on its own; PLN is not mature enough that you could just plug it in and count on it to do this well without any filling-in of missing pieces...

On Thu, Jul 10, 2014 at 3:12 AM, Linas Vepstas notifications@github.com wrote:

Hey Ben,

I'm starting to think that we should be generating word-senses directly as part of the early r2l processing. Working with word-senses would give us a simple middle-ground between "canadian@123" (this particular canadian) and ConceptNode canadian (the universe of all possible canadians). With this in mind, I added a section to the wiki page: http://wiki.opencog.org/w/Linguistic_Interpretation#The_Mihalcea_algo

Please review, I think you'll like it. It gives a hand-wavey description of how PLN fits in: PLN is used to help select/reinforce the most likely interpretation.

In the above, the initial choice of word-senses are taken from WordNet. We will also need to invent a mechanism to create new word senses as needed (e.g. maybe WordSenseNode "canadian@123" -- "this particular canadian")

I'm somewhat confused about how WordSenseNodes and ConceptNodes interact ...

— Reply to this email directly or view it on GitHub https://github.com/opencog/relex/issues/88#issuecomment-48520634.

Ben Goertzel, PhD http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our minds" -- Robert Nesta Marley

sebastianruder commented 10 years ago

@linas, my question was quite specific: I was wondering how to generate the following output, specifically the second InheritanceLink, from the ForAllRule

(ForAllLink
    (VariableNode "$X")
    (ImplicationLink
        (InheritanceLink
            (VariableNode "$X")
            (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248"))
        (InheritanceLink
            (VariableNode "$X")
            (ConceptNode "right-handed"))))

directly in the rule without having to modify additional rules as it depends on other rules (SV, SVO, be-inheritance, adj) to produce (InheritanceLink (VariableNode "$X") (ConceptNode "right-handed")). Otherwise this would require some postprocessing step which would merge the links and generate the ForAllLink which would make it more convoluted.

Concerning WSD, I've reviewed the original paper (Unsupervised Graph-based Word Sense Disambiguation Using Measures of Word Semantic Similarity) and the README and find this very interesting. I would like to wait, though, for what @cosmoharrigan has to say. If we decide to discuss this further, we should do this in this thread (as you've already alluded to this topic here: https://github.com/opencog/relex/issues/87) or in a separate one so things won't get mixed up.

linas commented 10 years ago

Re: the ForAllLink -- I don't know I haven't studied the details of how these rules work, Amen is the one to ask.

linas commented 10 years ago

Re: WSD: Yes a different thread would be good. Yes, that's the right paper. In the long run, we'd keep very little or none of the actual Mihalcea algo; as it gets replaced by PLN. What I really wanted to emphasize was how to visualize, how to think about, having multiple word-senses being available at the same time, and how to pick some and reject others. That is, the key is to understand figure 1, and the general concept. All the actual details will be quite different in the end.

ruiting commented 10 years ago

@sebastianruder Our original plan to deal with the ForAll issue was to apply SV/SVO/BE-INHERITANCE... rules and ForAll rule separately, then use some post processing to rewrite the graph, which is similar as the relative clause issue we discussed before. But recently, we decided to get rid of rewrite graph part of the post processing, and decided to deal with in an elegant way after PLN is more mature (see http://wiki.opencog.org/w/RelEx2Logic#What_Direction_to_Take.3F) So for now, we'll just write a list of separate ForAll rules to make the basic things work first. Thanks.

Rodas or William could add some simple ForAll rules soon.

sebastianruder commented 10 years ago

Thanks for this information, Ruiting. Let me know if I can be of any help. The ForAllRule that would be created would then produce for the sentence "All Canadians are right-handed" the output as cited above. Is that correct?

ruiting commented 10 years ago

yes, the idea is just to write some specialized ForAll rules and make them prior and mutually exclusive to SVO/SV/SVP.... rules. If you are in a hurry and also interested, you can just add the rules that you need. Rodas is busy with the R2L rules for conjunctions right now, which is also very useful for the task of testing syllogisms. Thanks.

sebastianruder commented 10 years ago

Thanks, @ruiting. I added an all-SVP-rule which combines the all-rule and the SVP-rule and produces the output that is needed for PLN reasoning: https://github.com/opencog/opencog/pull/924 and https://github.com/opencog/relex/pull/115.

Another thing: What's the difference between ALLRULE1 and ALLRULE2? They seem to be identical as the placeholder in the _poss relation is not used.

ruiting commented 10 years ago

@sebastianruder ALLRULE2 seemed not to be implemented yet. The original idea of writing separate rules for the all+possessive is for the restrained scope problem. For example,

All my writings are sad.

==>

ForAllLink $X ImplicationLink AndLink InheritanceLink $X writing PossessionLink $X me InheritanceLink $X sad

instead of

ForAllLink $X ImplicationLink InheritanceLink $X writing InheritanceLink $X sad

PossessionLink writing me

linas commented 10 years ago

explanations like this should be inlined with the code, so that anyone reading the code would also see the explanation immediately next to i.

sebastianruder commented 10 years ago

Good point. I added a comment in the relex commit: #115.

williampma commented 10 years ago

"PossessionLink" doesn't exist. Should it be similar to the

(EvaluationLink df-link-stv
        (PredicateNode "Possession" df-node-stv)
        (ListLink df-link-stv
            (ConceptNode noun_instance df-node-stv)
            (ConceptNode word_instance df-node-stv)
        )
    )

created by possessive-rule?

If @rodsol isn't working on it, maybe I will take a hack at it, since that-rule is mostly done and I am looking for the next thing to work on :)

ruiting commented 10 years ago

I have no idea what kind of principle we should follow on when a new link type should be added or avoided...

@williampma As far as I know, Rodas is working on the conjunctions so far, including fixing the RelEx bugs and the R2L rules on that. So you can go ahead on it. Thanks...

bgoertzel commented 10 years ago

On Tue, Jul 15, 2014 at 11:31 AM, Ruiting Lian notifications@github.com wrote:

I have no idea what kind of principle we should follow on when a new link type should be added or avoided...

https://github.com/williampma

It's kind of a vague principle. The philosophical idea is that link types should represent fundamental mathematical or semantic primitives.... "Possession" is a borderline case as it is generally considered a semantic primitive...

http://en.wikipedia.org/wiki/Semantic_primes

ben

@williampma https://github.com/williampma As far as I know, Rodas is working on the conjunctions so far, including fixing the RelEx bugs and the R2L rules on that. So you can go ahead on it. Thanks...

— Reply to this email directly or view it on GitHub https://github.com/opencog/relex/issues/88#issuecomment-49004187.

Ben Goertzel, PhD http://goertzel.org

"In an insane world, the sane man must appear to be insane". -- Capt. James T. Kirk

"Emancipate yourself from mental slavery / None but ourselves can free our minds" -- Robert Nesta Marley

williampma commented 10 years ago

It looks like PossessionLink exists at one point https://github.com/opencog/opencog/commit/cd0dc901d94e5fafd9ab2d75164175ae1471115d and https://github.com/opencog/opencog/blob/b4f951beab312a001fc98d7f76eb4cffc6311289/opencog/nlp/scm/relex-to-logic.scm#L141

Weird.. so what happened @AmeBel? Was there some discussion at some point to get rid of it and uses PredicateNode "Possession"?

sebastianruder commented 10 years ago

For me, the primary end of this issue is solved, so this can be closed when you're finished with your discussion.

One thing, though: When we have this link

(ForAllLink
    (VariableNode "$X")
    (ImplicationLink
        (InheritanceLink
            (VariableNode "$X")
            (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248"))
        (InheritanceLink
            (VariableNode "$X")
            (ConceptNode "right-handed"))))

this one shouldn't be there, should it?

(InheritanceLink
    (ConceptNode "Canadians@3c996520-c6d6-41b7-9f03-d87c9407f248")
    (ConceptNode "right-handed"))
williampma commented 10 years ago

Yes, I think the rules should have stayed exclusive, according to https://github.com/opencog/relex/pull/115#issuecomment-49003619

amebel commented 10 years ago

​It was agreed to use PredicateNode "Possession"​ instead of creating a new link. It was agreed that we should be adding new atoms sparingly. Refer to http://wiki.opencog.org/w/EvaluationLink#As_syntactic_sugar

sebastianruder commented 10 years ago

@williampma, I added the corrections here: https://github.com/opencog/opencog/pull/936 and here: https://github.com/opencog/relex/pull/130.

williampma commented 10 years ago

Those look good :)

And about ALLRULE2, I found out I cannot get that rule to apply at all since I cannot satisfy _quantity($noun, all) & _poss($noun, $W) with the sentence "All my writings are bad.", "All my oranges are tasty." from RelEx. Anyone else has this problem?

linas commented 10 years ago

"all" is usually treated as a "predeterminer" not a "quantity". The rule you need is _predet($noun, all) & _poss($noun, $W)

Note that some senteces with "all" in them are buggy in relex, see bug #132

linas commented 10 years ago

Hmm. "Many of the boys knew it" has a _quantity but "Both of the boys knew it" and "All of the boys knew it" do not. "All three of the boys knew it." has an incorrect (undesired) parse.

This should proably be covered by some distinct discussion and bug.

williampma commented 10 years ago

"all" is usually treated as a "predeterminer" not a "quantity". The rule you need is _predet($noun, all) & _poss($noun, $W)

I see. Thanks, @linas

williampma commented 10 years ago

This can be closed (Refer to opencog/opencog#947), unless a more specialized representation is still needed in the future.