Closed nmonath closed 10 years ago
Problem: Document Similarity -- goal would be to have a function which acts as a metric to determine how similar two documents are. Use this function both for typical query (given document, find similar ones) and also given two sets of documents find which are similar.
Now we need to do research on what has been done on this in the past!
Understand the problem we are trying to solve?
The general set up of experiments?
The evaluation measures?
How to use Syntax and other approaches besides bag of words? Our idea is to use statistics with thematic roles