impresso / impresso-frontend

🚀 The frontend application of the Impresso WebApp http://impresso-project.ch/app
GNU Affero General Public License v3.0
5 stars 0 forks source link

Significantly lower number of search results between impresso and eLuxemburgensia #419

Closed mduering closed 4 years ago

mduering commented 5 years ago

A query for el alamein in impresso yields 5 results: https://dev.impresso-project.ch/app/#/search?f=%5B%7B%22type%22%3A%22string%22,%22q%22%3A%22el%20alamein%22,%22precision%22%3A%22fuzzy%22%7D,%7B%22type%22%3A%22country%22,%22q%22%3A%5B%22LU%22%5D,%22op%22%3A%22OR%22%7D%5D&g=articles&p=1&o=-relevance

in eLuxemburgensia it's 261: http://www.eluxemburgensia.lu/R/ST6RQLLBNBXX5NPDU9LG5PSGG9L8YJG3X76SSDU5R4NRQ3T49R-01654

Tried with other keywords, always much less results in impresso in Lux newspapers.

For Swiss newspapers it seems ok: A quick and very messy comparison between impresso and https://www.e-newspaperarchives.ch not controlled for the same set of newspapers yields roughly similar numbers

danieleguido commented 5 years ago

@mduering we get 5 results because we erroneously add a language filter on SOLR, apparently we were looking on content_txt_fr only! However, if we extend the query using the correct fields we get /solr/impresso_dev/select?q=filter(content_length_i:[1%20TO%2010000])%20AND%20(content_txt_en:%22el%20alamein%22%20OR%20content_txt_fr:%22el%20alamein%22%20OR%20content_txt_de:%22el%20alamein%22)%20AND%20filter(meta_country_code_s:LU)&facet=true&facet.field=lg_s we get "only" 187 results. Article languages:

German (164 articles) 
Luxembourgish (16 articles) 
French (4 articles)  

(Also, the articles sum accorfing to language facet is 184!! the mystery deepens.) To be discussed with @e-maud?

mduering commented 5 years ago

Hi @danieleguido and @e-maud,

I just made a little test searching for "annweiler" (my hometown).

impresso yields 12 results for Lux newspapers, eLuxemburgensia 17.

Looking at the differences, I found this:

impresso finds an article in Luxemburger Wort from 23.6.1933 which is not found in eLux

eLux finds these which are not found by impresso: Der Volksfreund 27.6.1949 L'Union 24.9.1867 Escher Tageblatt 24.3.1936 Obermoselzeitung 18.10.1892 Der Arbeiter (not in impresso but shouldnt it?) 15.4.1882 Lux Wort 6.2.1932