mantono / DuplicateSearcher

Identification of Duplicate Tickets in Issue Tracking Systems for Software Development
0 stars 0 forks source link

Use Map to cache result of processing through IssueProcessor #39

Closed mantono closed 8 years ago

mantono commented 8 years ago

Most Tokens being processed through the various TokenProcessor classes is probably being processed multiple time (unless a Token only occurs once in the entire repository). Caching the result of this processing and saving the result in a Map for later lookup could have a tremendous performance improvement since that operation is very quick.

This would mean that the current implementation of how IssueProcessor and TokenProcessor works would have to change slightly, since right a TermFrequencyCounter is passed around and processed rather than single tokens, but this should not be too hard to change.