metaphacts / semopenalex

42 stars 5 forks source link

Filter + Regex does not work; Any alternative to do fuzzy search? #93

Closed yileitu closed 4 months ago

yileitu commented 4 months ago

I am trying to search a particular paper, the following script runs perfectly

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX soa: <https://semopenalex.org/ontology/>
PREFIX terms: <http://purl.org/dc/terms/>

SELECT ?work  WHERE {
  ?work rdf:type soa:Work .
  ?work terms:title "Compressive Rendering: A Rendering Application of Compressed Sensing" .
}

However the title is case sensitive and has to be exact matching, so I try to use FILTER and REGEX to fix it:

SELECT ?work WHERE {   
?work rdf:type soa:Work .   
?work terms:title ?title .  
FILTER regex(?title, "compressive rendering: a rendering application of compressed", "i") }

here I delete the last word "Sensing", make all letters lowercase and use the filter+regexm, but when executing, it raises backend error

image

Do u support FILTER operation? Is there any other ways to do substring matching or even fuzzy matching (case insensitive)?

davidlamprecht commented 4 months ago

Hello @yileitu , the easiest way to do a String-based fuzzy matching to find a specific Paper is using the entitylookup Service. The following Query will work for your provided Use-Case:


PREFIX Service: <http://www.metaphacts.com/ontologies/platform/service/>
PREFIX entitylookup: <http://www.metaphacts.com/ontologies/platform/service/entitylookup/>
PREFIX soa: <https://semopenalex.org/ontology/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?subject ?type WHERE {
  SERVICE Service:entityLookup {
    ?subject entitylookup:entityName "compressive rendering: a rendering application of compressed";
      entitylookup:candidateType soa:Work;
      entitylookup:type ?type;
      entitylookup:limit 10 ;
      entitylookup:score ?score;
      entitylookup:rank ?rank.
  }
}
ORDER BY DESC (?score) DESC (?rank)
LIMIT 10
VladimirAlexiev commented 2 months ago

@yileitu The reason regex cannot work is because your query fetches all 250M titles, then checks them one by one using regex.