mattbierner / dawg-set

Javascript directed acyclic word graph (DAWG)
14 stars 1 forks source link

Insert order #1

Open KamasamaK opened 7 years ago

KamasamaK commented 7 years ago

Is inserting into a DAWG inherently required to be in lexicographic order or is that specific to this implementation? If not, would there be a large performance hit?

mattbierner commented 7 years ago

No, lexographic insertion order is not a requirement of DAWGs in general. However most implementations, including this one, add the constraint because it considerably simplifies implementation and makes the structure faster to construct

This paper describes a more general approach that allows any insertion order: http://www.aclweb.org/anthology/J00-1002.pdf

KamasamaK commented 7 years ago

Thanks for the information. Before I unnecessarily attempt to implement it, are you aware of any JS/TS implementations that allow insertion in any order? The data I am working with changes often, and even with a performance hit, I assume it will still be faster than having the rebuild the entire structure every time (in addition to also having to maintain a sorted set with which to construct it).

For context, I am trying to implement a WorkspaceSymbolProvider for VS Code that will return partial matches and this seemed to be the most sensible data structure for that.

mattbierner commented 7 years ago

No, I didn't find any DAWG implementations in JavaScript that allow that.

Depending on how much data you're working with though, you may just want to use a trie and call it a day. DAWGs work great when you have a lot of strings and need to minimize memory usage. I created this library to store a complete set of urban dictionary entries in memory for fast lookup (word level instead of character level). That data set consisted of something like a million and a half unique entries

KamasamaK commented 7 years ago

Thanks. I am using a Trie now and it works well for finding prefixes, but I was hoping to be able to find arbitrary substrings as well.