sunlightlabs / churnalism_us

25 stars 5 forks source link

Spurious Match objects #16

Open dvogel opened 11 years ago

dvogel commented 11 years ago

The sitemap is generating URLs to matches that don't pass the filter thresholds and 404 as a result. Why/how are these Match objects created?

dvogel commented 11 years ago

These are the result of the superfastmatch returning inconsistent results over time. The most likely explanation is that the search document now matches more documents than it used to and the original match is now being excluded by the -num_results option. There is a chance (though much smaller) that all of the fragments that previously matched are now ignored due to the -max_posting_threshold option.