mantono / DuplicateSearcher

Identification of Duplicate Tickets in Issue Tracking Systems for Software Development
0 stars 0 forks source link

Performance improvements of processing #40

Closed mantono closed 8 years ago

mantono commented 8 years ago

A general overhaul to IssueProcessor and performance improvements has been made. interface methodTokenProcessor.process() now operates on single Tokens rather than TermFrequencyCounter. This has in turn enabled caching of processing of tokens, so a distinct token never has to processed more than once. The performance gain of this is somewhere between 7 - 16%.

After some profiling, the current bottleneck right now seems to be LevenshteinDistance.unlimitedCompare() which accounts for roughly 3/4 of the execution time during the processing of issues when spell correction is done.

This closes #39.