vcu-swim-lab / stack-intheflow

MIT License
14 stars 4 forks source link

Updating the dictionary using search results #83

Open damevski opened 7 years ago

damevski commented 7 years ago

The dictionary we use to create queries is based on the SO dump, which is several months old. Can we cleverly update this dictionary based on the retrieved results over time.

damevski commented 7 years ago

Here is a way this could work:

1) divide retrieved results into two buckets based on their timestamp: older posts (already in the dump) and newer posts 2) compute idf on terms in both buckets, separately, for each retrieved result 3) when the idf on numerous terms in the older post bucket becomes similar to that of what we have computed based on the dump, it signifies that we have gotten a good sample of documents to work with. therefore, we can use the idf of the terms from the newer posts to update the dictionary.