opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
813 stars 225 forks source link

Decentralized (vs. Distributed vs. Federated) #2138

Closed linas closed 1 year ago

linas commented 5 years ago

Ongoing remarks about a decentralized atomspace. Pertains to issues #1855 #1502 #1967 The key concepts are:

These three concepts are often confused with one-another, and are taken to be synonyms, they are not. Comments below unpack these in greater detail, listing the pros and cons.

linas commented 5 years ago

A distributed atomspace attempts to maintain the illusion that there is one single set of data, of which any particular machine might just hold a few shards. The concepts of "ACID" and "BASE" apply. So for example, atomic updates, vs. eventually consistent are both strategies for updating data in such a way that one maintains a consistent data state. (either immediately, by locking: "ACID", or eventually, by propagation of updates: "BASE")

The pros and cons:

Please note that the atomspace is already distributed. See the demo here This can be made to work on a large scale, because postgres is already massively scalable. So in a certain sense, that part is done. What is unsolved is the multi-user and authority-of-update issues surrounding this.

linas commented 5 years ago

A decentralized atomspace acknowledges that there is no single master copy, and that instead there are peers. Now some peers might be more authoritative, more correct, more knowledgeable than other peers, but the process for determining who is authoritative can be made to lie outside of the atomspace implementation. Determination of Authority is done at some other layer, and not hard-wired into the atomspace design.

Pros and cons:

linas commented 5 years ago

The concept of federation is that everybody runs their own server, and they exchange data with one-another. Classic examples of federation are email-servers, IRC servers, diaspora pods, etc. That is, there are owners/admins who run the server, and lots of users who use the server.

For the atomspace, users communicate with the servers using REST, or protobuff, or zeromq or ROS messages or whatever. (I don't care, as long as the performance is good and the API is maintained)

Pros and cons:

linas commented 5 years ago

The goal of this issue is to define some way of having decentralized atomspaces without the down-sides of federation, and without the authority-control issues of a distributed atomspace.

linas commented 4 years ago

Notes:

linas commented 4 years ago

The https://github.com/opencog/atomspace-cog/ client-server implementation provides a reasonably-fast quasi-peer-to-peer quasi-distributed atomspace. A collection of these could provide a true decentralized implementation if two things are provided:

linas commented 4 years ago

Candidate key-value stores:

Rocks seems better and more balanced performance. leveldb seems to have 15% smaller files than rocks. hyperlevel trades 2x faster write for 4.5x slower query.

Done. See https://github.com/opencog/atomspace-rocks/

linas commented 4 years ago

Assembling pieces-parts: https://github.com/opencog/atomspace-agents/

Why: copy of long email:

I want to talk about "service meshes". The problem with shopping for cassandra, or any of the other suggested databases, is that they are all "monolithic black boxes". You pick one, and you get what you get: whatever is provided, that's what it is. Sure, some configuration files somewhere allow you to tune this and that, but that's all.

The service mesh idea (and the npm/js idea before that) is to assemble your system out of small, self-contained pieces. Sure, the object-oriented folks have been talking about this for 3 or 4 decades, and it's cited as the raison-d'etre for things like C++. But C++ never lived up to this ideal. There are no generic C++ frameworks. None. At All. (OK, so SGI had one or two in the early 1990's ...) Something is ... missing... in C++. Compare this to node.js and npm which are wildly successful over-achievers in this category. People regularly build large applications by assembling a cacophony of tiny little javascript parts. Clearly, javascript has something that C++ does not. Something that makes the OO dream achievable not just in theory, but regularly validated in practice.

Now, there are some down-sides to npm apps: they contain hundreds or thousands of parts, and not all of them are well-maintained, and many have published security vulnerabilities that remain unpatched. Worse, patching some of them require incompatible API changes that would break users. So it has its own prickly and thorny issues that are unique and different from those that other languages (python, scheme, c++) suffer from.

In the cloud world, there has long been, and continues to be a movement to meshes of containerized applications. Here, docker is the prototypical container -- lxc/lxd/lxe more generally. Managing these containers requires kubernetes, and more: the "service meshes" (istio, microsoft open service mesh) provide a layer (a "control plane") that further manages deployments, error fallbacks, a/b testing, circuit-breakers, load-balancing, etc. The mental model is that containerized apps are just like npm nodes, except they are million times bigger and beefier (literally) and they all have network interfaces instead of javascript methods/objects. And since they are so much bigger, they need more active management.

Now compare the service-mesh idea to the olde-fashioned ideas of "web shopping carts" or "content management systems" or "customer relationship management systems". Those things were single, monolithic black boxes that you bought from a vendor (or installed via open-source) that automagically did everything for you, once you configured a few templates. They worked great, as long as what you wanted was (a) a web shopping cart, and (b) was customizable via some template or config file. If not .. you were SOL.

These monolithic architectures were their downfall, were the driver to containers, kubernetes and service meshes. The founders of cloud startup XYZ can't spray-paint some config files onto a monolith and then raise $20M in venture funding. But, give them a bunch of pieces-parts containers, that they can hook up in some new, novel and exciting way, plus a little secret sauce, and buzzword-bingo, a unicorn is born.

And this is why Cassandra makes me yawn with disinterest, if not a bit of hostility. It's a big monolithic block. Sure, I can take the AtomSpace, and plaster it onto Cassandra, like wrapping some wet paper around a rock. The ultimate shape is still that of the rock, no matter how brightly-colored or thoughtful that paper wrapped around it is.

So, I'm trying to grab hold of this idea of pieces-parts. OpenCog needs pieces-parts that can be arranged and re-assembled into that mesh that provides the distributed-atomspace attributes and requirements du-jour.

Yes, of course, singularity.net is also pursuing a vision of pieces-parts that can be assembled. Which is why I am a bit dumb-founded that we are entertaining ideas like Cassandra -- it is the very antithesis of modular architecture. It's the opposite of a dapp -- It's a big giant lump, the one ring to rule them all. It's kind of exactly the poster-child for what not to do ...

For a distributed atomspace, what we really need to focus on is inter-operability, so that, like javascript (and unlike c++) it is easy to assemble modules out of other modules. Like containers, there should be some fairly regularized API for communications (I nominate atomese-as-ascii-strings i.e. s-expressions and maybe plan-B atomese-as-json). With this under control, we can move on to creating unique, custom services aka agents aka dapps or whatever these other things might be.

linas commented 1 year ago

Closing. Everything here is now possible with a combination of ProxyNodes and StorageNodes See https://wiki.opencog.org/w/Networked_AtomSpaces for details.