opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
826 stars 234 forks source link

Split atoms copying and atom creation factory methods and generalize ClassFactory #2024

Open vsbogd opened 5 years ago

vsbogd commented 5 years ago

This issue is to discuss plan of removing unnecessary double atoms creation which were discussed at issue #1965, especially in comments:

At the moment createNode and createLink functions create instance of unspecific instance of Node or Link class and passes them into ClassServer::factory which creates another atoms with proper class.

Suggested plan:

linas commented 5 years ago

I think I like this. But first some general commentary, just so that things are clear.


OK that's the story of factories. The stories of validators are different.


So I think I like your proposal. Some point by point comments:

Sure, fine, don't care.

Uh sure, I guess ?? I'm not sure what this solves.

Yes. I think you discovered this in your unit test. I wrote splceFactory(), and although it is only 20 lines long, it took me two or three very long days to write it; I tried half a dozen other approaches, some of which were extremely complex, and took many hundreds of lines of impossible code. I was very happy when I finally found a simple way to create spliceFactory. By the end, I was so disgusted and tired, that I did not want to do it again (i.e. I did not feel like creating spliceValidator) . But your unit test discovered the bug that results from my laziness.

OK

Avoid copies, per above.

Careful, I had to fix a bug in ValueFactory just a few weeks ago. It works, I like the idea, but the use of the static is ... interesting, non-obvious, subtle, and error prone.

Uhh, sure I guess. I don't understand what that means. Keep in mind, most atoms should be "empty shells", unborn seeds, just passing through, temporary holders for a type, a node-string and a handle-seq. it should be simple and fast to create them, and then let them die as soon as they've moved thier atom-style-date from the API to the atomspace.

linas commented 5 years ago

OK, so here:

There a bad bug there. createNode should not call the classserver().factory(tmp->get_handle()) That is seriously fucked up. I don't know how that happened: that is an outright bug. It needs to be fixed as soon as possible. As explained above: createNode should create a small, simple "empty shell" of an atom, as it moves from scheme/python to its final home inside the atomspace. Factory should run once and only once!

https://github.com/opencog/atomspace/blob/e47dc46daa278c22cf3ba0c503e12b0e1e9b025d/opencog/atoms/base/Node.h#L128-L134

The above is a bug. Apparently, I am the one who added this bug, 16 months ago: https://github.com/opencog/atomspace/commit/7b709229e3f4ccd8156f02d585341df10615a076 I just now tried to fix it and 84 unit tests failed out of 143; most segfaulted. That's just crazy. 17 months ago, things worked fine, so what happened? Yecchhhh.

And here: https://github.com/opencog/atomspace/commit/44b212bad7efa3c92b9fc1a43103810a8cda1c57 It seems that I was trying to remove the factory from the atom table, so that the atom table code would be simpler. But after this change, the "empty shells" are no longer empty. So that was a mistake.

This stuff is hard. We don't have any good performance tests to catch these kinds of mistakes.

It's also possible that I am making a mountain out of a molehill. Maybe the main performance bottlenecks are in a completely different place...

linas commented 5 years ago

This comment:

This is a symptom of a general confusion about when to use temporaries/empty-shells, and when one has the "real" atom. Inserting an atom into the atomspace is expensive (its slow, there is lock contention) and so there is a lot of effort made to avoid putting atoms into the atomspace, until we are really sure we want to. As a result, there are a lot of temporaries and "empty shells" floating around. But sometimes, the empty shell is not good enough, because we need to call some method on the "real" c++ class. This results in a lot of hackery and confusion.

Maybe the correct long-term fix is that, whenever we need an actual C++ instance, and the FooAtomCast fails, then we should insert the atom into the atomspace, and try FooAtomCast, again.

vsbogd commented 5 years ago

Avoid copies, per above.

I agree that only place to make an atom copy is AtomTable. And may be it is better to not copy atoms at all, just use same instance in multiple atomspaces as you suggested in https://github.com/opencog/atomspace/issues/1967 . On the other hand I would allow copying in AtomTable itself till #1967 is not implemented.

vsbogd commented 5 years ago

There a bad bug there. createNode should not call the classserver().factory(tmp->get_handle())

As I see previous AtomTable code called _classserver.factory(Handle(createLink(closet, atom_type)));, effectively chain of calls was the same. If factory method signature will be changed from Handle (*)(const Handle& ) to Handle (*)(...) it will be possible to implement createNode and createLink methods by call of the factory itself.

vsbogd commented 5 years ago

Working on PR https://github.com/opencog/atomspace/pull/2029 I have realized that in case of multiple inheritance (for example NODE_C <- NODE_B, NODE_A) and when both NODE_A and NODE_B have validators then only validator of NODE_B will be called. Should we make a chain of validators calls to run all of type validators not only one?

linas commented 5 years ago

Avoid copies, per above.

1967

Lets ignore that, for now. #1967 is complicated, and I kind of want newcomers to not get confused about copying.

linas commented 5 years ago

it will be possible to implement createNode and createLink methods by call of the factory itself.

Yes, but what I am saying is that createNode and createLink almost surely should not be calling the factory. (?) They should be creating temporary "empty shells". The idea is that the "empty shell" avoids the CPU over-head of calling the ctor. The goal is that the "empty shell" is a small, light-weight, non-functional way to temporarily hold a type, a name-string, an handle-seq for just a little while.

vsbogd commented 5 years ago

They should be creating temporary "empty shells"

But I didn't see in code examples when empty shell is created and used before calling factory on it. And from my perspective it makes sense because all that you can do with empty shell is persist it. To call any method on it you need to convert empty shell to the instance of the right class.

linas commented 5 years ago

Should we make a chain of validators calls to run all of type validators

Yes.

linas commented 5 years ago

If you get the urge to rename ClassServer to something else, that would be OK too.

ngeiswei commented 5 years ago

I'm not sure either why you'd absolutely need an atomspace to fully instantiate and use an atom.

What is the problem with filling the shell before inserting to atomspace?

Inserting has a cost, sometimes and you'd rather keep redundant copies and avoid that cost. Well, I guess you can always have an atomspace per copy but that's an overhead.

linas commented 5 years ago

But I didn't see in code examples when empty shell is created and used before calling factory on it.

OK. It probably doesn't matter, at this time. What I'm trying to say is that if you put a printf into some atom constructor, you will see it print twice: once, when the atom is created outside the atomspace, and once when the private-copy is created. I'm mostly just torturing myself (and torturing you) over this; there is no obvious or easy fix for this, and I should just stop talking. But now you know: most atoms get created twice.

implement createNode and createLink methods by call of the factory itself.

Yes, go ahead and do this. Sorry to have derailed the conversation so much. #2029 looks good. Is everything here now done?

linas commented 5 years ago

Hi @ngeiswei

I'm not sure either why you'd absolutely need an atomspace to fully instantiate and use an atom.

Well, the classic example is banking. If there is one and only one globally-unique atom per bank-account, then you can safely store money in it. But if I have two atoms for one bank account, and I withdraw $500 from one of them, does that mean I still have $500 in the other?

For this reason, pretty much all databases since the dawn of time have insisted that atoms are always globally unique; chaos ensues when they are not. That said, there are 1001 exceptions where data does not have to be absolutely, totally unique.

vsbogd commented 5 years ago

Is everything here now done?

Not yet, will work on this further