opencog / asmoses

MOSES Machine Learning: Meta-Optimizing Semantic Evolutionary Search for the AtomSpace (https://github.com/opencog/atomspace)
https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search
Other
38 stars 31 forks source link

Efficient Table Representation #16

Closed ngeiswei closed 6 years ago

ngeiswei commented 6 years ago

The various representations suggested in issues #3, #12 and #14 are great for reasoning but not so great for efficient calculations, thus the following suggestion: Represent column values (i.e. values associated to each feature) as a list of values living in the atom feature itself. For instance assume we have table

+--+--+--+
|o |f1|f2|
+--+--+--+
|1 |0 |1 |
+--+--+--+
|1 |1 |0 |
+--+--+--+
|0 |0 |0 |
+--+--+--+

The values feature f1 would be represented as the list [0,1,0] attached to f1 via the Atom::setValue method. The key could be

Node "*-AS-MOSES:SchemaValuesKey-*"

and the ProtoAtom value could be

  1. FloatValue if f1 is numerical
  2. LinkValue if f1 is Boolean, in such case TrueLink or FalseLink could be used to represent true and false. An alternative would be to implement BoolValue that holds directly boolean C++ values which would be more efficient.

That representation could be obtained directly from a Table or from the various existing representation. Since reasoning isn't needed yet it could be fine to obtain it directly from the Table.

An another thing we'll want to support is to represent duplicated rows in the same manner that CTable does, but that's for another time and another issue.

ngeiswei commented 6 years ago

Implemented in https://github.com/singnet/as-moses/pull/3

linas commented 6 years ago

If you get the urge to implement BoolValue, that's OK, I guess. Initially, I wanted to stay minimalist.

linas commented 6 years ago

Also, it is OK to use the special-purpose key Node "*-AS-MOSES:SchemaValuesKey-*" for now, but in the long run, you will want to make this user-specifiable, maybe even per-feature. That way, you can wire-in data sources from wherever.

Note that some values are designed to be time-changing...