mantono / DuplicateSearcher

Identification of Duplicate Tickets in Issue Tracking Systems for Software Development
0 stars 0 forks source link

Analysis #11

Closed mantono closed 8 years ago

mantono commented 8 years ago

What should be done? Analysis of all issues comparing them to each other (not neccessarily n²)

Input: A set of issues Output: All issues which are duplicates of another issue in the input set.

Issues it depends on: #4

mantono commented 8 years ago

Alternatives

  1. Jaccard
  2. Inverse Document Frequency
  3. Term Frequency & Inverse Document Frequency
  4. Vector Space Model (using either one of the above) with Cosine
mantono commented 8 years ago

Found this method by accident, could maybe be of interest for our Analyzer or any of our FrequencyCounter classes.