lucidworks / auto-phrase-tokenfilter

Lucene Auto Phrase TokenFilter implementation
Other
59 stars 63 forks source link

Optimize token filter and qparser for a 4x increase in throughput #28

Open shalinmangar opened 7 years ago

shalinmangar commented 7 years ago

This pull request optimizes the token filter and qparser for a 4x increase in throughput mostly by cutting down copies of data as well as CPU spent during said copy operations. Most places that used cha[] earlier now use String. Even though it looks like it should be more inefficient now, the master code had to create strings out of those char[] in many places which was slower and wasteful. Similarly, I use LinkedList instead of ArrayList for unusedTokens so that the removeFirst operation is O(1).

This PR also adds a .gitignore file.