opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
823 stars 234 forks source link

Storing generic source code in the AtomSpace. #2877

Closed linas closed 3 years ago

linas commented 3 years ago

This is a kind of "stupid computer trick" that should have been done a decade ago. It allows storing of "arbitrary" code, in the AtomSpace. It's the result of assorted conversations. It converts the Atomspace into a database for storing arbitrary kinds of structured stuff, such as JSON, Python, "insert favorite language here"... thus making the AtomSpace a JSON database, a python database, etc.

This is experimental: very rough. In fact, JSON, python, etc. not implemented, only s-expressions right now. See the demo. I hope to add JSON maybe later today. Or maybe python. Or maybe both. Might take a few days. Smothing out rough edges will take a good bit more. Edges will be smoothed only if there are actually any users who care about any of this.

Quoting the README:

Foreign Abstract Syntax Tree (AST) Examples

Consider the idea of a database that stores JSON expressions. JSON is a way of representing data in a certain labelled key-value format. JSON can be thought of as a certain kind of abstract syntax tree (AST) for data. The AtomSpace is a knowledgebase that stores trees. Therefore, one ought to be able to store JSON in the AtomSpace.

Such databases already exist, and some are even popular. This is because JSON is a fairly reasonable way of representing structured data. The goal of these examples is to observe that the source code for almost every programming language can be decomposed into an abstract syntax tree. With a parser in hand for some specific language, the resulting trees can be stored in the AtomSpace. That means that the language in question can now be used as a knowledge representation system, it can be used to store data, in the conventional sense of "a database". That database comes with a fairly powerful search engine/query system, "for free" -- its provided by the AtomSpace pattern engine. To the best of my knowledge, there does not exist any database that allows you to store data in a python format. If there was such a thing, you could store python in it. For example one could store the python3 code

  x = list(("apple", set(( 1, 2, 3)), "cherry"))
  print("the result:", a+b)

Note that, because it is being stored, and not executed, the a and b in the example above do not need to be defined. A query on this dataset might be "find all expressions containing the word 'cherry'" or "find all expressions containing the symbol 'print'". Careful, though: the goal here is not to just store "source code"; that would be silly. Every competent programmer has thier favorite tools for searching source code for certain expressions. The point here is that vast oceans of trees can be stored, and very complex queries can be performed on the data. Explcitly, the data is represented as a graph, and the queries are written in a (hyper-)graph query language (HQL).

Well, that's all very nice. But for now, the existing code is still experimental and very incomplete. The demos explore the limits of what is actually possible.

linas commented 3 years ago

Merging. A PLN unit test is stalling; it is almost surely not due to this.