krotik / eliasdb

EliasDB a graph-based database.
Mozilla Public License 2.0
994 stars 49 forks source link

Full Text Search - Documentation of Limits #11

Open erichiller opened 7 years ago

erichiller commented 7 years ago

I haven't seen this in the documentation, but following the tutorial and then inserting nodes with a large-ish string as an attribute intended for Full Text Search (query) - is there a max string length for attribute values? I was trying to to implement a hybrid of graph and full text search, but insertion alone was taking seconds per page (HTML)?

krotik commented 7 years ago

Hi Eric,

that is an interesting use-case which I am quite interested in myself. There is in general no maximum string length for attribute values.

The index (full text search index) is word based so an input string is split into words (split by whitespace). The words are then stored in two ways:

node attribute name + word -> node keys + word positions node attribute name + md5 hash of value -> node keys

First one for word and phrase search - second one for value lookup. The code for this can be found under /eliasdb/graph/util/indexmanager.go Have a look at the unit test to see how this component works...

Since a lot of HTML doesn't have spaces between tags I would imagine that the words get quite long. It might help to chop them up a bit...

Best way forward to narrow down what exactly goes wrong would be to write a unit/benchmark for the IndexManager with some suitable test data.