netwerk-digitaal-erfgoed / registry-demo

Demonstrator of the Dataset Register.
https://datasetregister.netwerkdigitaalerfgoed.nl/
0 stars 1 forks source link

language filtering leads to incomplete search results #13

Closed coret closed 1 year ago

coret commented 1 year ago

When searching for "nta" in the Dutch version of the demonstrator you get 4 results, searching for "nta" in the English version you get 3 results.

The SPARQL query uses the language of the GUI to filter on the language of the title and description, like FILTER(LANG(?title) = "" || LANGMATCHES(LANG(?title), "en")). The NTA/KB dataset has a title@nl and description@nl, so won't be selected in the English version.

coret commented 1 year ago

The original query)%0A%20%20%20%20FILTER(LANG(%3FpublisherName)%20%3D%20%22%22%20%7C%7C%20LANGMATCHES(LANG(%3FpublisherName)%2C%20%22en%22))%0A%20%20%20%20FILTER%20CONTAINS(LCASE(%3Ftitle)%2C%22nta%22)%20.%0A%7D%20ORDER%20BY%20%3Ftitle) is:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct:  <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?dataset ?title ?publisherName WHERE {
    ?dataset a dcat:Dataset ;
             dct:title ?title ;
             dct:publisher ?publisher .
    ?publisher foaf:name ?publisherName .
    FILTER(LANG(?title) = "" || LANGMATCHES(LANG(?title), "nl"))
    FILTER(LANG(?publisherName) = "" || LANGMATCHES(LANG(?publisherName), "nl"))
    FILTER CONTAINS(LCASE(?title),"nta") .
} ORDER BY ?title

an alternative version%20)%0A%20%20%20%20%7D%0A%20%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%20%20%3Fdataset%20dct%3Atitle%20%3Ftitle%20.%0A%20%20%20%20%20%20%20%20FILTER(langMatches(lang(%3Ftitle)%2C%22%22)%20)%0A%20%20%20%20%7D%0A%20%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%20%20%3Fdataset%20dct%3Atitle%20%3Fentitle%20.%0A%20%20%20%20%20%20%20%20FILTER(langMatches(lang(%3Fentitle)%2C%22en%22)%20)%0A%20%20%20%20%7D%0A%20%20%20%20BIND(%20COALESCE(%20%3Fnltitle%2C%20%3Ftitle%2C%20%3Fentitle%20)%20as%20%3Ftitle%20)%0A%20%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%20%20%3Fpublisher%20foaf%3Aname%20%3FnlpublisherName%20.%0A%20%20%20%20%20%20%20%20FILTER(langMatches(lang(%3FnlpublisherName)%2C%22nl%22)%20)%0A%20%20%20%20%7D%0A%20%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%20%20%3Fpublisher%20foaf%3Aname%20%3FpublisherName%20.%0A%20%20%20%20%20%20%20%20FILTER(langMatches(lang(%3FpublisherName)%2C%22%22)%20)%0A%20%20%20%20%7D%0A%20%20%20%20OPTIONAL%20%7B%0A%20%20%20%20%20%20%20%20%3Fpublisher%20foaf%3Aname%20%3FenpublisherName%20.%0A%20%20%20%20%20%20%20%20FILTER(langMatches(lang(%3FenpublisherName)%2C%22en%22)%20)%0A%20%20%20%20%7D%0A%20%20%20%20BIND(%20COALESCE(%20%3FnlpublisherName%2C%20%3FpublisherName%2C%20%3FenpublisherName%20)%20as%20%3FpublisherName%20)%0A%20%20%20%20FILTER%20CONTAINS(LCASE(%3Ftitle)%2C%22nta%22)%20.%0A%7D%20ORDER%20BY%20%3Ftitle), using COALESCE is:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct:  <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?dataset ?title ?publisherName WHERE {
    ?dataset a dcat:Dataset ;
             dct:publisher ?publisher .
    OPTIONAL {
        ?dataset dct:title ?nltitle .
        FILTER(langMatches(lang(?nltitle),"nl") )
    }
    OPTIONAL {
        ?dataset dct:title ?__title .
        FILTER(langMatches(lang(?__title),"") )
    }
    OPTIONAL {
        ?dataset dct:title ?entitle .
        FILTER(langMatches(lang(?entitle),"en") )
    }
    BIND( COALESCE( ?nltitle, ?__title, ?entitle ) as ?title )
    OPTIONAL {
        ?publisher foaf:name ?nlpublisherName .
        FILTER(langMatches(lang(?nlpublisherName),"nl") )
    }
    OPTIONAL {
        ?publisher foaf:name ?__publisherName .
        FILTER(langMatches(lang(?__publisherName),"") )
    }
    OPTIONAL {
        ?publisher foaf:name ?enpublisherName .
        FILTER(langMatches(lang(?enpublisherName),"en") )
    }
    BIND( COALESCE( ?nlpublisherName, ?__publisherName, ?enpublisherName ) as ?publisherName )
    FILTER CONTAINS(LCASE(?title),"nta") .
} ORDER BY ?title

This is a better solution because, the number of search results doesn't differ between languages, but is bloated and more complex for the casual dataset searcher making her/his first SPARQL query...

coret commented 1 year ago

@ddeboer is this use of COALESCE what you had in mind?

ddeboer commented 1 year ago

More or less, yes, but in this case the query can be simplified by removing COALESCE:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct:  <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT * WHERE {
    ?dataset a dcat:Dataset ;
             dct:publisher ?publisher .

    OPTIONAL {
        ?dataset dct:title ?title
        FILTER(langMatches(lang(?title), "nl"))        
    }

    OPTIONAL {
        ?dataset dct:title ?title
        FILTER(langMatches(lang(?title), "en"))        
    }

    OPTIONAL {
        ?dataset dct:title ?title
    }    

    OPTIONAL {
        ?publisher foaf:name ?publisherName
        FILTER(langMatches(lang(?publisherName), "nl"))                
    }    

    OPTIONAL {
        ?publisher foaf:name ?publisherName
        FILTER(langMatches(lang(?publisherName), "en"))                
    }    

    OPTIONAL {
        ?publisher foaf:name ?publisherName
    }   
}