Currently the front-end gets weird results from the back-end:
There are articles missing (eg the top one for "vaccines" query)
There are duplicates of the same article (so that clicking on one selects both)
There are near-duplicates, eg differing in 'http' vs 'https' or escaped & in URL
In the article table, the column group_id is meant to identify multiple copies of the same article (articles with the same title). Lucene should index only one article among those with the same group_id.
The Lucene search function has a duplicate filter to avoid having duplicate results.
One or both of the above must have broken in the new version.
Currently the front-end gets weird results from the back-end:
In the article table, the column
group_id
is meant to identify multiple copies of the same article (articles with the same title). Lucene should index only one article among those with the samegroup_id
.The Lucene search function has a duplicate filter to avoid having duplicate results.
One or both of the above must have broken in the new version.