Closed sanjeevgour closed 7 years ago
try this:
Trie t = new PatriciaTrie("world");
String target = "theworldisaniceplace";
StringBuilder ret = new StringBuilder();
t.findLongestWord(target, 0, target.length(), ret);
System.out.println("found: " + ret);
Thanks, that works great, appreciate your response. Few more questions- -Can I say stop and return at first match whether it is shortest or longest doesn't matter? -Can I get all matching words from the dictionary I gave as input. For example, if my earlier dictionary also contained the word "nice", I would like to get a list containing "world" and "nice". -Is it possible to return the index of the words found in the input text?
These are all use cases I have for my application.
-Can I say stop and return at first match whether it is shortest or longest doesn't matter?
Try findShortestWord method. That stops search when the first shortest word found while findLongestWord method keeps searching if longer word exists in a dictionary.
-Can I get all matching words from the dictionary I gave as input. For example, if my earlier dictionary also contained the word "nice", I would like to get a list containing "world" and "nice".
You have to write a loop using find***Word method like this:
Trie t = new PatriciaTrie("world", "nice");
String target = "theworldisaniceplace";
List<CharSequence> found = new ArrayList<>();
int pos = 0;
StringBuilder ret = new StringBuilder();
while((pos = t.findLongestWord(target, pos, target.length(), ret)) != -1){
found.add(ret);
pos += ret.length();
ret = new StringBuilder();
}
System.out.println(found);
-Is it possible to return the index of the words found in the input text?
Use the returned value from find***Word method. That indicates the index of found word in input text.
That solved most of my use cases, thanks for quick response and help.
Can I make the search case insensitive?
You're very welcome:)
Can I make the search case insensitive?
No. To do case-insensitive search, you have to store capitalized string and use capitalized query string, or implement a new method for Trie4J in saving original case.
Ok, case-change is what I am doing currently. Thanks.
One more questions. Among the different tree implementations you have, which one would be the least expensive processing wise, in other words which one would take least CPU cycles?
Depends on the function. DoubleArray takes the least computation power for contains method. But it takes more power for predictiveSearch method than PatriciaTrie.
I am using InlinedTailLOUDSTrie and mostly use findShortestWord and find the findLongestWord methods. It appears that this combination takes significant CPU cycles. I am wondering if I could tune it.
DoubleArrays use less CPU time than LOUDSTries in find????Word method. According to the current source code, both DoubleArrays and LOUDSTries seem to have little room to optimize speed by reducing object creation.
I'm benchmarking and optimizing the findShortestWord methods of some tries.
1942.908 ms in 100000call: org.trie4j.patricia.TailPatriciaTrie
2408.908 ms in 100000call: org.trie4j.doublearray.DoubleArray
178.622 ms in 100000call: org.trie4j.doublearray.OptDoubleArray
2716.288 ms in 100000call: org.trie4j.louds.InlinedTailLOUDSTrie
619.349 ms in 100000call: org.trie4j.louds.OptInlinedTailLOUDSTrie
The result shows that the findShortestWord methods of DoubleArray and InlinedTailLOUDSTrie can be significantly improved.
Trie4J 0.9.6 with improvements above is out. Here is the latest bench results: https://github.com/takawitter/trie4j/blob/master/trie4j/benchmarks/20170720-2.6GHzCorei7-OSX10.11.6-jdk1.8.0_111.findWord.txt All improvements for Opt*** classes were integrated to non-Opt classes of 0.9.6 (i.e. improvements for OptDoubleArray were integrated to DoubleArray).
These are great improvements, almost 5 time. :+1:
Is is possible to search a word contained within some text. For example, search the word "world" in the string "theworldisaniceplace"?