Open damevski opened 7 years ago
Here is a way this could work:
1) divide retrieved results into two buckets based on their timestamp: older posts (already in the dump) and newer posts 2) compute idf on terms in both buckets, separately, for each retrieved result 3) when the idf on numerous terms in the older post bucket becomes similar to that of what we have computed based on the dump, it signifies that we have gotten a good sample of documents to work with. therefore, we can use the idf of the terms from the newer posts to update the dictionary.
The dictionary we use to create queries is based on the SO dump, which is several months old. Can we cleverly update this dictionary based on the retrieved results over time.