One of the drawbacks of our approach is that it cannot handle "new" terms that did not exist when the dump wast indexed. We should come up with a way to handle this.
Current ideas include:
Updating the dictionary based on the results retrieved from StackOverflow utilizing some measure of entropy to determine when a suitable sample set has been achieved.
Querying StackOverflow for the number of questions that contains a term and recalculating metrics based on that.
Analyzing the code to determine possible candidate new terms, inserting them into queries and validating the result by listening to what answers the user clicks on if any.
Utilizing the contents of manual queries to augment the search?
One of the drawbacks of our approach is that it cannot handle "new" terms that did not exist when the dump wast indexed. We should come up with a way to handle this.
Current ideas include:
Updating the dictionary based on the results retrieved from StackOverflow utilizing some measure of entropy to determine when a suitable sample set has been achieved.
Querying StackOverflow for the number of questions that contains a term and recalculating metrics based on that.
Analyzing the code to determine possible candidate new terms, inserting them into queries and validating the result by listening to what answers the user clicks on if any.
Utilizing the contents of manual queries to augment the search?