krotik / eliasdb

EliasDB a graph-based database.
Mozilla Public License 2.0
998 stars 49 forks source link

Index and query nested attributes? #9

Closed mmindenhall closed 7 years ago

mmindenhall commented 7 years ago

Very cool project! Just saw it today for the first time.

What are your thoughts on supporting nested JSON structures rather than just primitives within nodes? This is a requirement for the project I'm working on, and I'm willing to help if it's a feature you'd care to add.

For example, earlier today after working through the tutorial, I tried to store the JSON structure below:

> store
{
   "nodes": [
      {
         "key":"mytest",
         "kind":"Test",
         "int":42,
         "float":3.1415926,
         "str":"foo bar",
         "nested":{
            "nested_int":12,
            "nested_float":1.234,
            "nested_str":"time flies like an arrow"
         }
      }
   ]
}

The first try, I got this result:

SyntaxError: JSON.parse: expected property name or '}' at line 1 column 4 of the JSON data

After validating the JSON, I suspected that might be masking a different error, so I put all of the JSON on a single line and tried again:

GraphError: Could not write graph information (gob: type not registered for interface: map[string]interface {}) 

So I added the appropriate gob.Register call, and rebuilt. After doing this, the store operation succeeded, but I was of course unable to query based on nested values. With this feature, I would expect the node to be returned by all the following queries:

> get Test where nested.nested_int = 12
> get Test where nested.nested_float > 1.0
> get Test where nested.nested_str beginswith "time"
> index Test nested.nested_int value 12
> index Test nested.nested_str word "flies"
krotik commented 7 years ago

Hi Mark,

what an intriguing idea. In the current version a node has only the notion of key and primitive value or list of primitive values. So you would need to store the nested JSON as a string:

{
   "nodes": [
      {
         "key":"mytest",
         "kind":"Test",
         "int":42,
         "float":3.1415926,
         "str":"foo bar",
         "nested": "{ \"nested_int\":12, \"nested_float\":1.234, \"nested_str\":\"time flies like an arrow\" }"
      }
   ]
}

What you are proposing is that EliasDB actually understands the value as a JSON object.

A major pain point in this is Go's static type system. For Go to understand an object it needs to know its structure. However your request would require it to understand an object of arbitrary nesting depth until a certain level.

I think it is a good idea and I think it should be included - unless it would significantly break something which I can't see at the moment.

Here are my thoughts:

On the storage level I am not sure if the structure should be stored as Go objects or as a JSON encoded string. The best would be to store it as actual Go data structures i.e. maps and lists and primitive types. However, I fear that every single combination would have to be registered with gob. If that is the case, it would have to be stored as a JSON string.

The REST API should obviously detect that it was given a nested structure and process it appropriately.

The EQL parser/interpreter would need to understand such structures (I think the lexer should be fine). The syntax you are proposing seems fine. Though, I am not quite sure how equality between objects should be modeled ...

krotik commented 7 years ago

Hi Mark,

I've done now some work on this and I think the current code should cover your use cases. You can now store objects with multiple nesting levels.

In the low-level storage these objects are represented as Go's map[string]interface{} objects (i.e. you can have map[string]map[string]interface{} etc.).

EQL and the full-text search index do support nested structures. EQL distinguishes between the cases when you have a nested object (i.e. nested : { nestlevel1 : { nestlevel2 : { ... ) and when you have an attribute name containing dots (i.e. nested.nestlevel1 : nestlevel2 : { ... ). You can declare an attribute name with dots in EQL with the attr: prefix. It is also possible to show only parts of a nested object by using the new "objget" function in the show clause.

The full-text search works as you describe in your example. Just note that if you are searching for a word or phrase you don't need to specify any quotes.

See the attached screenshot for details:

nested_attributes