opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
823 stars 234 forks source link

Proposal: TFValueOf #2004

Closed linas closed 4 years ago

linas commented 5 years ago

Proposal for a significant change in inner workings of pattern matcher and related subsystems.

Proposal: introduce a special binary true/false value on all atoms.

Why?

How?

OK, that's real easy, so far. Now the hard part. The stuff below is more controversial, and might have un-intended consequences:

Since changing cog-evaluate! will screw up some existing code, propose two new atom types.

Maybe-bonus that maybe I should NOT mention here, but ah what the heck:

  1. Short-term: Performance. There is a HUGE overhead to get in and out of scheme/python; its about 50 microseconds, which is crazy for arithmetic. Using TimesLink, etc will give a big performance boost.

  2. Long-term: By placing the PLN formulas in the atomspace itself, the formulas could be reasoned about. Or even learned. For example, MOSES could try to find optimal formulas for PLN to use. Some deep-learning net could do the same. It will be a while before we could actually do this, but at least it now would be possible. Its not possible, as long as the formulas are buried in black-box GroundedPredicateNodes (or wherever PL:N is keeping them).

linas commented 5 years ago

Off-topic, but there's another bonus side-effect: If we have a NewGroundedPredicateNode that takes the EvaluationLink as the first argument, then would could introduce a PythonEvaluationLink, as a kind-of EvaluationLink, but it would have a C++ class behind it, that looks like this:

class PythonEvaluationLink : EvaluationLink {
private:
    PyObject* pyDict;
    PyObject* pyUserFunc;
public:
    PythonEvaluationLink() {
           pyDict = PyModule_GetDict(pyModule);
           pyUserFunc = PyDict_GetItem(pyDict, "whatever");
    }

and then the python function could save local state into the pyDict, instead of storing it in ValuePtr. Basically, it would store whatever you wanted to stick into ValuePtr. And also store any other python stuff you might ever want to store.

You'd use it like so:

 (PythonEvaluationLink
      (NewGroundedpredicateNode "py:foobar")
      (ListLink (...args))

In particular, large parts, or most of, or even all of today's class PythonEval would be moved to the new class PythonEvaluationLink and so PythonEval would no longer be this wart on the side, but a first-class real-life atom.

So, then to make it easy for foobar to save state, it would get either the C++ PythonEvaluationLink atom as first argument, or maybe we would also have some python/cython code (in addition to the C++ code):

class PythonEvaluationLink:
    def __init__(self):
          _stuff = 0
    def save_user_state(self, user_stuff):
         _stuff = user_stuff
    def get_user_state(self):
        return _stuff

so that foobar just calls these two methods to set and restore state.

bgoertzel commented 5 years ago

So if I understand this one point correctly: if we have a GPN that is naturally thought of as returning some sort of complex TruthValue object, this TruthValue would be attached to the EvaluationLink?

On Tue, Jan 22, 2019 at 3:39 AM Linas Vepštas notifications@github.com wrote:

Off-topic, but there's another bonus side-effect: If we have a NewGroundedPredicateNode that takes the EvaluationLink as the first argument, then would could introduce a PythonEvaluationLink, as a kind-of EvaluationLink, but it would have a C++ class behind it, that looks like this:

class PythonEvaluationLink : EvaluationLink { private: PyObject pyDict; PyObject pyUserFunc; public: PythonEvaluationLink() { pyDict = PyModule_GetDict(pyModule); pyUserFunc = PyDict_GetItem(pyDict, "whatever"); }

and then the python function could save local state into the pyDict, instead of storing it in ValuePtr. Basically, it would store whatever you wanted to stick into ValuePtr. And also store any other python stuff you might ever want to store.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

-- Ben Goertzel, PhD http://goertzel.org

"The dewdrop world / Is the dewdrop world / And yet, and yet …" -- Kobayashi Issa

ngeiswei commented 5 years ago

@linas

Change GroundedPredicateNode to expect a bool return value, instead of a TV. This will hit a lot of code and examples. (It will simplify them, and make them easier to understand!)

I suppose you mean, change GroundedPredicateNode to support bool, alongside TV, right?

ngeiswei commented 5 years ago

So, if understand correctly you want to use a bit of Atom::_flags to store the boolean return value of a virtual link during pattern matching. But what if 2 pattern matchers are working over the same virtual link? Also why would you need to store it anyway, in most cases you just pass it around?

ngeiswei commented 5 years ago

Allow TruthValueOfLink to work with PlusLink, TimesLink, etc. This will enable arithmetic in the AtomSpace.

Do you mean that PlusLink, TimesLink, etc, could accept Value, not just NumberNode. So for instance one may write

(TimesLink (TruthValueOf A) (TruthValueOf B))

Yeah, that + short-hands like StrengthOf and ConfidenceOf would make this possible.

Since all PLN TV formulas are simple arithmetic, this will allow PLN tv formulas to be stored in the atomspace.

When confidence isn't taken into account, yes there are simple arithmetic. When confidence is taken into account, it's another story, it ideally requires a form of numerical integration, but sure, with adequate operators, we could (and ultimately should) do that.

ngeiswei commented 5 years ago

Also, as much as I like the idea of using boolean value for pattern matcher, we already have TruthValue::TRUE_TV() and TruthValue::FALSE_TV(). These are static so passing them around amounts to passing pointers. To take advantage of them we'd just need to add scheme/python/haskell bindings for them, and then have the pattern matcher assume their usage, so that comparing them would just amount to comparing pointers.

But I would also be perfectly happy to have a BoolValue (that would inherit from Value) or a BoolTruthValue (that would inherit from TruthValue) that would just hold true or false.

I'm not against storing some TFValue in Atom::_flags either, though I have to admit I feel a bit weird about it, but the devil, or the divine, is in the details.

ngeiswei commented 5 years ago

Change cog-evalutate! to return binary 0/1 as hinted in #1996

Currently cog-evaluate! over any most atom type returns its TV. For instance

(cog-evaluate! (Inheritance (stv 0.5 0.5) (Concept "A") (Concept "B")))

returns

(stv 0.5 0.5)

It also looks like (cog-evaluate! x) could be just turned into a shorthand for (cog-tv (cog-execute! x)) but that creates hairy cases.

For instance

(cog-tv (cog-execute! (Inheritance (stv 0.5 0.5) (Concept "A") (Concept "B"))))

is equivalent to

(cog-evaluate! (Inheritance (stv 0.5 0.5) (Concept "A") (Concept "B")))

However

(cog-tv (cog-execute! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg"))))

is not equivalent to

(cog-evaluate! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg")))

as it returns (stv 1 0) instead of the TV returned by my-gpn.

Maybe that is why you, @linas , suggest to store the resulting TFValue in the EvaluationLink, right?

noskill commented 5 years ago

It seems that there two separate usecases for logical operators in Atomese:

1) pattern matching on graph e.g. "or" means match such subgraph or such. 2) Computation of TruthValue, like it is done by URE.

It makes sense to separate this usecases by using different names e.g. PMOrLink which would expect boolean values and OrLink AndLink, IfThenElse etc. links which would be interpreted by URE or other interpreter.

vsbogd commented 5 years ago

For me main question of the issue is: PatternMatcher uses boolean logic, but EvaluationLinks (OrLink, AndLink, ...) return TruthValue which are converted into boolean by PatternMatcher code. How can we improve this conversion or eliminate it at all?

@linas, as I see you have suggested adding boolean values into atomspace but why not introduce BoolValue as regular atomspace value and modify PatternMatcher to work with it? Then we can change evaluation links (AndLink, OrLink, ...) implementation to return BoolValue. TruthValue for these links will be calculated by URE using specific logic rules. Or as Anatoly said add PatternMatcher specific PMAndLink, PMOrLink, ... to work with BoolValue.

If PatternMatcher query includes calculation of some atom which returns TruthValue it can be wrapped by EvaluationLink or ExecutionOutputLink which converts TruthValue to BoolValue. So author of the query can specify the conversion logic himself. To adapt already written PatternMatcher queries to new API we can introduce BoolValueOfTVLink which will convert TV to boolean using current PatternMatcher style.

ngeiswei commented 5 years ago

I think AndLink, OrLink and NotLink could be overloaded to work with BoolValue/TFValue, however we should stop using AndLink to wrap groundable non-virtual clauses and instead use PresentLink, and only put virtual clauses under an AndLink. So for instance

instead of

(Get
  (And
    (Inheritance (Variable "$X") (Variable "$Y"))
    (Inheritance (Variable "$Y") (Variable "$Z"))
    (Equal (Variable "$X") (Variable "$Z"))))

one should write

(Get
  (And
    (Present
      (Inheritance (Variable "$X") (Variable "$Y"))
      (Inheritance (Variable "$Y") (Variable "$Z")))
    (Equal (Variable "$X") (Variable "$Z"))))

PresentLink would have the effect of checking if a given pattern is present in the AtomSpace, return T if it is, F if it isn't. And EqualLink would return a BoolValue/TFValue too.

I know that it adds an extra PresentLink but it makes the semantics of the pattern matcher more transparent and in bonus has the potential to decrease the usage of quotations. For instance today if you want to fetch AndLink from the atomspace, you may write

(Get (LocalQuote (And (Variable "$X") (Variable "$Y"))))

but you can also write

(Get (Present (And (Variable "$X") (Variable "$Y"))))

The beauty is that PresentLink is already implemented and the examples above work. Since PresentLink is underused it probably has a few bugs here and there but they shouldn't be hard to fix. I have intended to convert all my code (URE, pattern miner, etc) to use PresentLink, which I'll do right after I'm done with my current tasks. IMO, it should become the norm rather than the exception, and eventually using AndLink to check the presence of groundable clauses should be phased out.

vsbogd commented 5 years ago

@ngeiswei, it is an excellent idea, we have discussed same thing with @noskill yesterday, but I didn't remember about PresentLink. It will make PatternMatcher queries more explicit.

I would add that in case if you need to calculate some predicate on grounded variable during pattern matching then you need some way to convert TruthValue to BoolValue/TFValue like BoolValueOfTVLink I have described above.

linas commented 5 years ago

Responding in chronological order. @bgoertzel asks:

So if I understand this one point correctly: if we have a GPN that is naturally thought of as returning some sort of complex TruthValue object, this TruthValue would be attached to the EvaluationLink?

Yes. If two different EvalLinks call the same GPN, then the return value is kept with the EvalLink that called it. And more: the EvalLink can keep track of all of the hidden state needed to make that GPN work efficiently.

linas commented 5 years ago

@ngeiswei asks

I suppose you mean, change GroundedPredicateNode to support bool, alongside TV, right?

No, I mean get rid of TV entirely. It is useless and completely unused; when you red through the code, all of it does some version of this:

TruthValuePtr tv = evaluate(...);
if (tv->getMean() > 0.5) {...} else {...}

which is dorky and pointless and a waste of CPU cycles. Instead, I propose this:

bool do_it= evaluate(...);
if (do_it) {...} else {...}

No waste, no fuss, no memory-reference counting, no pointless creation of TV values which are then immediately discarded.

linas commented 5 years ago

So, if understand correctly you want to use a bit of Atom::_flags to store the boolean return value of a virtual link during pattern matching. But what if 2 pattern matchers are working over the same virtual link? Also why would you need to store it anyway, in most cases you just pass it around?

I do not recall saying "virtual link" anywhere. I meant ordinary Atom. Perhaps this can be done without storing, but this is not obvious. Having a cached value you can always get, instead of always being force to recompute it is convenient. But perhaps it does not need to be stored. I'd have to think about that more.

linas commented 5 years ago

Remaining questions later; not ignoring you just busy.

ngeiswei commented 5 years ago

I suppose you mean, change GroundedPredicateNode to support bool, alongside TV, right?

No, I mean get rid of TV entirely

Well, as long as one can create grounded function to return TruthValue (or any Value type) I'm OK with that. It's just that in OpenCog/PLN literature a predicate node outputs TV. That isn't to say we can't change it. Or maybe we'd want to create a subtype GroundedBoolPredicateNode.

linas commented 5 years ago

@ngeiswei asks:

Do you mean that PlusLink, TimesLink, etc, could accept Value, not just NumberNode. So for instance one may write (TimesLink (TruthValueOf A) (TruthValueOf B)) Yeah, that + short-hands like StrengthOf and ConfidenceOf would make this possible.

You asked three questions, the answers are: No, yes, and yes.

First question: "Do you mean that PlusLink, TimesLink, etc, could accept Value," No. Links are still links, same as they ever were, and can only have atoms in their outgoing set; never values. However, many atoms are evaluatable. Such as EvaluationLink and ValueOfLink. If evaluatable atoms return values that can be interpreted as numbers, then PlusLink, TimesLink, etc, can add and multiply those numbers. '''This already works today''', it was implemented last summer, so that the neural net guys (@Necr0x0Der and crew) could use it for their neural nets. The working demo is here: https://github.com/opencog/atomspace/blob/master/examples/atomspace/stream.scm

Third question/remark: "StrengthOf and ConfidenceOf " Yes. It would take under an hour to implement these two, and everything else will just work automatically, already.

By the way, why does approximately half the code call it "mean", and the other half call it "strength"? Maybe we can pick one, or the other?

Finally, you made a comment:

simple arithmetic.

I mean "normal arithmetic" and not a simplified version of it. (there is a thing called "simple arithmetic" and I don't mean that) By arithmetic, it is meant any expression involving plus, minus, times, divide. Arithmetic excludes the possibility of infinite series, infinite sequences, min and max over infinite sets, limits, etc. so no sine, cosine, lim sup lim inf, integrals or analysis. Basically, arithmetic is of the same strength as first-order logic, while analysis requires second-order logic (i.e. requires the algebra of infinity).

linas commented 5 years ago

Regarding this comment:

However (cog-tv (cog-execute! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg")))) is not equivalent to (cog-evaluate! (Evaluation (GroundedPredicate "my-gpn") (Concept "my-arg")))

The meta-goal is to fix this, so that it would be equivalent. The only reason that it is not equivalent is, as far as I can tell, is because cog-execute! does not know what to do with a GPN, and ignores it. Are there cases where these should not be equivalent?

linas commented 5 years ago

@vsbogd writes:

For me main question of the issue is: PatternMatcher uses boolean logic, but EvaluationLinks (OrLink, AndLink, ...) return TruthValue which are converted into boolean by PatternMatcher code. How can we improve this conversion or eliminate it at all?

Bingo! Yes, exactly!

@linas, as I see you have suggested adding boolean values into atomspace but why not introduce BoolValue as regular atomspace value and modify PatternMatcher to work with it? Then we can change evaluation links (AndLink, OrLink, ...) implementation to return BoolValue. TruthValue for these links will be calculated by URE using specific logic rules. Or as Anatoly said add PatternMatcher specific PMAndLink, PMOrLink, ... to work with BoolValue.

Because this doesn't solve the main issue. Instead, it introduces a bunch of new atom types that are almost the same as some other atom types, but just slightly different. This is bad for multiple reasons:

Those are the reasons I dislike BoolValue.

linas commented 5 years ago

@ngeiswei writes:

we should stop using AndLink to wrap groundable non-virtual clauses and instead use PresentLink,

Yes, we should. However, this is extremely difficult, and requires a major deep dive and restructuring of the pattern matcher. I tried doing this maybe a year ago or two, and after a long week, I failed. I kept the branch, pushed it to github somewhere in my repo, in case I wanted to try again.(here: https://github.com/linas/atomspace/tree/absent-link-rework ) There are multiple existing issues to cover this already. #215 -- #217 -- #1596 -- #1882

linas commented 5 years ago

@ngeiswei

short-hands like StrengthOf and ConfidenceOf

This particular subtask seemed to be so straight-forward, that I simply just coded it up. It's here: pull req #2015 -- please review. I think it is more-or-less "complete". it can probably be extended to distributional TV's as well, or more generic values, but for now, its just for ordinary TV's,

ngeiswei commented 5 years ago

By the way, why does approximately half the code call it "mean", and the other half call it "strength"? Maybe we can pick one, or the other?

The problem is that in the Simple TV code "strength" may either correspond to a probability or a fuzzy value.

But I would argue that even in the case of a probability the term "mean" is misleading. Does it mean the mean frequency of an occurrence of some positive event? Does it is mean the actually probability estimate of that positive event? Current it is implemented as the former, which is different than the later, as the actual probability estimate requires a prior (typically a Beta distribution with a Jeffrey or Bayesian prior), and the mean as currently implemented is actually the mode of the second order distribution.

Hopefully the Distributional Value will clarify the terminology, first by removing the confusion between fuzzy and probability, and second by presenting the estimates, mean, mode, standard deviation, etc, as obtained from second order distributions.

linas commented 4 years ago

Closing. This was an interesting bu indeterminate conversation. The meta-issue was how to optimize the pattern engine for crsip true-false values, and pull req #2663 takes a step in that direction.