Followup to blog post - Githubissues

jmfarp2011 commented 9 years ago

This issue is a follow up to the discussion at http://kleineblase.wordpress.com/2013/11/01/graph-database-in-javascript.

I don't plan to maintain the two repos separately, but if there is a need, then I certainly will consider it.

"Key collision is a matter of DB design." Agreed, I'll be opening a separate issue addressing a better indexing/collision handler, and correcting the modification of the String.prototype. The intent I think will be to have a default behavior where the entire entity is hashed via sha1, but that default handler can be overridden with a custom handler.

"Actually I’m leaning towards generating the graph in Perl" Then maybe I need to look at separating the Node javascript and the client javascript, reusing objects as much as possible, and using grunt to build the two versions of the end javascript... thoughts?

"calling a JS function to obtain the future key is not possible." That was never the intention. In fact, the intention is that the only id generated on the client is a temporary client-id when a new entity is created on the client and then that client-id is overridden by the id returned from the server once the entity is created on the server. If the object comes to the client with an id, that should be the id used on the client. I'll also open a ticket to resolve this in the code.

jmfarp2011 commented 9 years ago

The String.prototype modification, custom index method, and the client id issues have all been resolved in the master branch now.

The String.prototype is no longer modified to add a sha1 method, instead the sha1 is a private method of the Graph.Collection object and is used as the default indexing method for generating client ids.

The indexing method can now be overridden with a custom function by supplying an indexGenerator function to the options when instantiating a Graph.Database object. For example,

var testDb = new Graph.Database({ 
  indexGenerator: function(e){ return e.type+''+e.name; }
});

Finally, the client id is generated by checking for an existing 'id' key or 'cid' in that order, and only generates a client id if no id or client id already exists on the object. This is generated by the custom index generator method, if supplied. If no custom function is supplied, it is generated as an sha1 hash string from the JSON.stringify representation of the object.

daniel-pfeiffer commented 9 years ago

So I got a chance to play with Neo4j. Mixed blessing, we actually managed to hang it with few queries, and when it did run, a -[*]-> query over 200 000 nodes took 10s (far beyond the milliseconds I was led to believe). And the corresponding query in reverse direction never came back :-( Not sure how much time we'd need to put into optimization. However I really liked the Cypher language.

Now I wouldn't expect a small project like yours to have a full blown query language. Finding direct neighbours is fine for my needs. However query for nodes by regexp would be great. (I want to lookup korean vocabulary, which alas has many homophones with wildly varying spellings and many more similar sounding words.)

Your indexGenerator looks promising. But I'm thinking that the syntax is still not human editable enough. So I'm tending towards a simple map&list YAML for input. YAML has this reference syntax

First occurrence: &anchor Value
Second occurrence: *anchor

where *anchor is an alias to the node preceded by &anchor, i.e. "value". This alas requires it to have been defined earlier. That's a headache for cycles (though it's technically possible by nesting referenced nodes) and a no go for a clean sorted layout. So I'm thinking about non-YAML-style lists-of-indexes instead, but that'll need more than one pass to connect everything. Not sure if that should be done in JS (your call?) or Perl.

node1:
     some attribute: value
     related to: ['node2', 'node3']
     opposite: node4
node2:
     ...

jmfarp2011 commented 9 years ago

I am curious the ability to do such queries with another database system... JOIN Hell?

I think the regex would be a nice addition to the filter methods:

  q.filter({attribute__regex: /([A-Z])\w+/})

I think I will work on this soon.

I have actually been trying to conceptualize an method to ingest the datasource from a JSON serialized graph, rather than a lists of entities and edges. This seems something like your non-YAML-style example above. I think that will be a better method to cache the data on the client, and then reinitialize the database from that cache. Store the data as a graph. Or, utilize the client-side database for the current entity's graph, for quick traversal, only making calls to server for the graph of entities not in the current graph. But, rather than make it require two visits to each entity, it could create the entity if it doesn't exist, or update it if it does exist.

jmfarp2011 commented 9 years ago

Daniel,

I'm working the regex filter now and I'm just curious if you think a simple function function(a, b){ return a.search(b) > 0; } which would let you invoke it as db.query().filter(key__regex: /idk/); would suffice for your needs? I have it working with English text, but not sure how to test that with Korean vocabulary.

jmfarp2011 commented 9 years ago

A few updates... The regex feature has been implemented.(all branches) Graph.js has been refactored into a modular set of files compilable with grunt. (v.2 branch) I also removed the sha1 code frome Graph.js. I still use it in my example code, but I supply it as my indexGenerator function. (v.2 branch) Graph.Database now also supports supplying an indexGenerator as a key alternative to a generator function. (v.2 branch)

jmfarp2011 / Graph.js

Followup to blog post #1