Closed wilhelmer closed 4 years ago
This is technically fine, but not intuitive. I assume most users try to perform an exact match search by using double quotes, because that's the way it's usually done in Google
Do you have data to back that up? I've never seen someone using quote syntax in Google Analytics for the official docs.
It would be nice if Material supported this as an additional method to perform an exact match search.
Adding some RegEx magic to defaultTransform() should do the trick.
The regex is already pretty complicated:
I'm not sure whether +my +search +term
is semantically the same as "my search term"
, as putting a query in quotes searches for a phrase (i.e. order is important), while the former syntax searches for occurrences, at least what I understand from Google and other search engines. Also, note that +my +search +term
is currently translated to +my* +search* +term*
.
This is technically fine, but not intuitive. I assume most users try to perform an exact match search by using double quotes, because that's the way it's usually done in Google
Do you have data to back that up? I've never seen someone using quote syntax in Google Analytics for the official docs.
In the past two years, I've had 701 site searches with quotes, 0 searches with the + operator.
Note that the + operator is deprecated in Google Search, so people familiar with Google's syntax will probably avoid it.
I'm not sure whether
+my +search +term
is semantically the same as"my search term"
, as putting a query in quotes searches for a phrase (i.e. order is important), while the former syntax searches for occurrences, at least what I understand from Google and other search engines. Also, note that+my +search +term
is currently translated to+my* +search* +term*
.
Well yes, that is a drawback. We can't exactly replicate Google's search behavior since lunr doesn't support specifying a word order. So +my +search +term
will always include results that have "search" in the heading and "term" somewhere way below in a paragraph. Is that such a major drawback that we shouldn't allow search with quotes at all? Not sure.
BTW, I played around with our search and was able to improve the results by boosting the first word: +my^100 +search +term
. But that may be anecdotal.
I guess we could implement quotes when we have a spec that is complete. If you can draft something up as part of this issue, we might consider implementing it. Some questions to be answered by this specification:
"my search
"+my -search" +term
'my search
As a rule of thumb, we should probably stay as close as possible to Google. Please specify the respective transforms for all cases. Also, maybe there're some more edge cases I haven't thought of.
On a side note, with the release of Material 5 and the support of the whole Lunr Syntax, we should think of how we can integrate a cheat sheet of how to use the search / docs, etc. For example, search by title, fuzzy search etc. It's quite powerful:
I am not a search expert, but I would like to see this feature implemented as well, so here are my inputs on the subject.
- What happens with unmatched quotes?
"my search
From what I'm seeing in the Google specification, unmatched quotes would be considered as an incomplete query. You may have to disable look ahead search when the query starts with a double quote.
- What happens when we have modifiers in quotes?
"+my -search" +term
Modifiers in quotes are taken as literal characters and not interpreted as modifiers.
- What about single quotes?
'my search
Single quotes are their own character. They may appear in SQL commands, and therefore do not encapsulate the search query.
- What happens with unmatched quotes?
"my search
From what I'm seeing in the Google specification, unmatched quotes would be considered as an incomplete query. You may have to disable look ahead search when the query starts with a double quote.
I think unmatched quotes are simply stripped from the query. So "my search
should be equal to my search
.
More observations via Google:
Search for "
-> unmatched quote is not stripped
Search for ""
-> yields no results at all.
Some additional information here: http://www.googleguide.com/quoted_phrases.html
Tidbits:
A quoted phrase is the most widely used type of special search syntax.
So let's support it :-)
Google will search for common words (stop words) included in quotes Google doesn’t perform automatic stemming on phrases
So disable lunr's pipeline for quoted search terms? Might be difficult to implement?
Okay, so I guess we have to check how to escape modifiers, so they're treated as literals in queries. The stemmer is disabled for most languages, as most people don't like it and wonder why their queries don't return any results. It also doesn't work well with the type-ahead experience. Thus, there's only a trimmer
and stopWordFilter
in place. Both are only part of the indexing pipeline, not of the search pipeline, which includes only the stemmer by default.
For this reason, I see no possibility to disable the stopWordFilter
for searching or even parts of queries. Thus, from what I understand, the only thing we could try to implement would be "my search terms" =>
+my +search +terms`.
On a second note, there's another problem with some of those modifiers, as -
is, by default, part of the tokenizer separator. Thus, all -
are lost before adding a document to the index. This might be problematic in regard to the automatic transformation, because "my -search +term" would theoretically become +my +\-search \+term
, which may lead to no results. Before implementing this feature, we really need to understand how lunr
handles special characters. Or we just ignore modifiers and just do the simple "my search terms" =>
+my +search +terms` transformation, disregarding any special character cases.
Okay, so I suggest the following proposal:
+
: "my search" term
becomes +my +search term
+
, -
, ...: "+my -search term" becomes
+\+my +\-search +term
"my +search term
becomes my +search term
It's not trivial to implement, though. I guess we have to depart from the simple regular expression based transform and have to implement a rather decent parser. Should be possible in a few dozen lines.
[1] See the end of the paragraph on QueryString
Thanks! Bullets 1) and 3) are fine for me. 2) looks a little overengineered. I assume the number of users that want to both use exact search and search for control characters is too low to justify the development effort. So we could leave that out, and "+my -search term"
becomes ++my +-search +term
, which simply yields no results.
Yes, 2. is probably not much of a use case.
Implemented in #1778. Feedback appreciated so we can push this out quickly.
Also, see the comments in the doc block of the defaultTransform
function.
Great, thanks! There's no global modifier in .split(/"([^"]+)"/)
, meaning that only the first quoted phrase will be modified.
E.g. "my search term" is "a great term"
-> +my +search +term is "a great term"
-> 0 results.
We haven't discussed that - I think it's also not much of a use case, so we can leave it as it is. Just wanted to mention.
Thanks for pointing it out, we should add it!
Okay, if you change it anyway, you can also fix some small typos in the comment:
and -> an
expected -> expect
asterisik -> asterisk
All settled in 759f2b9e.
Looks good! If there's a space after the first quote, the query will be transformed like this:
" my search term"
-> + +my +search +term
At first, I thought this might be a problem, but lunr seems to handle that and remove orphan control characters. So no problem.
Released as part of 5.4.0
Why isn't this working for me? I have tried both of these:
plugins:
- search:
separator: '[\s\-,:!=\[\]()"/]+|(?!\b)(?=[A-Z][a-z])|\.(?!\d)|&[lg]t;'
plugins:
- search
I searched for example phrase
or "example phrase"
and neither one returns any result.
EDIT:
Oh, it seems to have to do with some word-count limit in the search, but it also seems bugged in other ways. If I search for "Cariogenic microbiome"
it will return results for both cariogenic
and microbiome
. If I search for Cariogenic microbiome and microbiota of the early primary dentition
it will give me tons of results but not ranked in the "most matches" or "best matches" order, so it's useless. If I search for "Cariogenic microbiome and microbiota of the early primary dentition"
I get no results.
I checked that...
Description
Material supports lunr's search syntax. With that syntax, to perform an exact match search (logical AND search), all terms must be prefixed with a plus (+):
+my +search +term
This is technically fine, but not intuitive. I assume most users try to perform an exact match search by using double quotes, because that's the way it's usually done in Google:
"my search term"
It would be nice if Material supported this as an additional method to perform an exact match search.
Adding some RegEx magic to
defaultTransform()
should do the trick.Use Cases
Users that don't know how to do a logical AND search would have a much greater chance to find out by trial and error if double quotes were supported.