w3c / sparql-dev

SPARQL dev Community Group
https://w3c.github.io/sparql-dev/
Other
123 stars 19 forks source link

String matching using wildcards #85

Open afs opened 5 years ago

afs commented 5 years ago

Regular expressions can be complex. Strings with wildcards are simpler.

Proposed solution

Provide string matching using wildcards as an additional, alternative to regular expressions by adding a new function. The string is anchored.

Examples:

MATCH(?string, "abc*")

MATCH(?string, "*abc*", "i") # Case insentive.

MATCH(?string, "a?c")

Previous work

Glob patterns SQL LIKE Lucene wildcard searches

Considerations for backward compatibility

None.

lisp commented 5 years ago

why?

ktk commented 5 years ago

@lisp that is a very common scenario in the real world and right now I have to look up every time how I can do it with regex. I teach SPARQL on a regular base as well, that would definitely facilitate simple string-matches for users.

cygri commented 5 years ago

If this were added, it should be a different name. When explaining SPARQL, one constantly has to talk about matching—and it usually means matching graph patterns against triples. Having a function called “match” that uses the word in a different sense does not help.

Some possible other names:

FILTER wildcard(?title, "*sparql*", "i")
FILTER like(?title, "*sparql*", "i")

This could also be combined with #34:

?doc :title ~"*sparql*"i.
dbooth-boston commented 5 years ago

It is called glob in several other languages.

lisp commented 5 years ago

if the goal is succinctness, it makes sense to go all the way to something like

?doc :title ~"*sparql*"i.

but

VladimirAlexiev commented 5 years ago

Shex has a similar construct called Stem. It works for strings, IRIs (also prefixed) and lang tags

afs commented 1 year ago

Shex Stem is fn:starts-with / STRSTARTS.

afs commented 1 year ago

I agree "match" is already used for graph patterns,. It is also valuable a as a keyword.

LIKE is good depending on the SQL implications (SQL uses _ and % for what is commonly * and ? in shells and filename matching); SQL LIKE also has character classes and negated character classes.

Filename matching with glob matching, where * means any character except the component separator, and some systems (e.g. git) add ** to mean "filename, any depth".

Possibilities:

The other choice s what matching language.

SQL LIKE can be rewritten to a regex expression and there are code examples for that online.

In Java, there a few open source direct implementations with * and ?, but not [ ] character classes. (The JDK supports glob on filenames, not directly for strings).

ktk commented 1 year ago

I think being close to SQL is not a bad idea so I like the LIKE idea. If I would not know about that I would go for STRMATCH as it resembles other functions in SPARQL but then again it might add to the confusion. LIKE is unique in that sense.

And while I like the GLOB idea I have to agree that I mainly know it for file matching.