New types of tree: ConcurrentPermutermTree, ConcurrentWildcardTree for wildcard queries

GoogleCodeExporter commented 9 years ago

It would be useful to support wildcard queries.

Two approaches to be investigated (both of which will be tracked in this issue):

(1) A permuterm index on top of the ConcurrentRadixTree. This would support 
queries such as "<prefix>*<suffix>" on a single tree. It may be more memory 
efficient than a hash-dictionary approach. See: 
http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html

(2) A composite of a ConcurrentRadixTree and a ConcurrentReversedRadixTree. One 
tree would support prefix lookup, the other suffix lookup. Query 
"prefix*suffix" may return the intersection of the results from both trees, 
after some post-filtering. This second approach however, is near the territory 
of a query engine on top of multiple indexes, so if implemented would not 
belong in this project, but in http://code.google.com/p/cqengine/

Example usage for (1) would be:

public static void main(String[] args) {
    PermutermTree<Integer> tree = new ConcurrentPermutermTree<Integer>(new DefaultCharArrayNodeFactory());

    tree.put("TEST", 1);
    tree.put("TOAST", 2);
    tree.put("TEAM", 3);

    System.out.println("Keys matching 'T*T': " + Iterables.toString(tree.getKeysMatching("T", "T"))); // prefix, suffix
}

Output would be:
    Keys matching 'T*T': [TOAST, TEST]

Original issue reported on code.google.com by ni...@npgall.com on 24 Mar 2013 at 10:19

GoogleCodeExporter commented 9 years ago

Can a ConcurrentWildcardTree be used for url routing with parameters? like 

/home/products/*/tab/*?color=* ?

Original comment by respriet...@gmail.com on 16 Jan 2015 at 9:19

GoogleCodeExporter commented 9 years ago

I'm planning to take a different/better approach than discussed above to 
support multiple wildcard queries, when I eventually get time for this.

Basically a potentially better way to support multiple wildcard lookup, is to 
set up a tree of InvertedRadixTrees.
Each InvertedRadixTree has a getKeysPrefixing method, which returns keys in the 
tree which match an input string with a trailing wildcard.

If you set up a tree of these InvertedRadixTree, then you can accelerate 
lookups where your url patterns contain any number of wildcards.

Original comment by ni...@npgall.com on 19 Jan 2015 at 1:19

lunaticare / concurrent-trees

New types of tree: ConcurrentPermutermTree, ConcurrentWildcardTree for wildcard queries #3