Open KamasamaK opened 7 years ago
No, lexographic insertion order is not a requirement of DAWGs in general. However most implementations, including this one, add the constraint because it considerably simplifies implementation and makes the structure faster to construct
This paper describes a more general approach that allows any insertion order: http://www.aclweb.org/anthology/J00-1002.pdf
Thanks for the information. Before I unnecessarily attempt to implement it, are you aware of any JS/TS implementations that allow insertion in any order? The data I am working with changes often, and even with a performance hit, I assume it will still be faster than having the rebuild the entire structure every time (in addition to also having to maintain a sorted set with which to construct it).
For context, I am trying to implement a WorkspaceSymbolProvider
for VS Code that will return partial matches and this seemed to be the most sensible data structure for that.
No, I didn't find any DAWG implementations in JavaScript that allow that.
Depending on how much data you're working with though, you may just want to use a trie and call it a day. DAWGs work great when you have a lot of strings and need to minimize memory usage. I created this library to store a complete set of urban dictionary entries in memory for fast lookup (word level instead of character level). That data set consisted of something like a million and a half unique entries
Thanks. I am using a Trie now and it works well for finding prefixes, but I was hoping to be able to find arbitrary substrings as well.
Is inserting into a DAWG inherently required to be in lexicographic order or is that specific to this implementation? If not, would there be a large performance hit?