"unwanted" results - Githubissues

torvista commented 1 year ago

By default, searches use these field lists

model-exact,name-description,model-broad,meta-keywords,category,manufacturer

When entering a partial model number, results from name-description are shown first, before any matching string in the model. This ordering is to be expected, due to the ordering of the fields.

What is confusing is when typing further characters of the model (which matches broad but not exact), the results become all description matches (which seem to match nothing of the search string) until the complete model is entered and the exact-model matches/is suddenly shown at the top of the list..

Changing the field order to

model-exact,model-broad,name-description,meta-keywords,category,manufacturer

ensures the partial match stays in the list and gets to the top when there is a unique match/before exact match.

What is confusing is why there are results from descriptions that do not match the search string:

After adding some logging for the sql and results:

$queryText=tt-hmt.11 $sql= SELECT p., pd.products_name, m.manufacturers_name, MATCH(pd.products_name) AGAINST('tt hmt 11' IN BOOLEAN MODE) AS name_relevance_boolean, MATCH(pd.products_name) AGAINST('tt-hmt.11' ) AS name_relevance_natural , MATCH(pd.products_description) AGAINST('tt-hmt.11' ) AS description_relevance FROM products_description pd JOIN products p ON (p.products_id = pd.products_id) LEFT JOIN manufacturers m ON (m.manufacturers_id = p.manufacturers_id) WHERE p.products_status <> 0 AND pd.language_id = 1 AND p.products_id NOT IN (8275) AND ( ( MATCH(pd.products_name) AGAINST('tt hmt 11*' IN BOOLEAN MODE) + MATCH(pd.products_name) AGAINST('tt-hmt.11' ) ) > 0 OR MATCH(pd.products_description) AGAINST('tt-hmt.11' ) > 0 ) ORDER BY name_relevance_boolean DESC, name_relevance_natural DESC, description_relevance DESC, p.products_sort_order, pd.products_name LIMIT 14

$limit=14 $this->alphaFilter=0 14 RESULT(S) FROM QUERY

I see the boolean clause adds results based on the string being split, and matching tt or htm or 11 (why?) and also the natural relevance combine to produce 14 results, when I would have expected none.

marco-pm commented 1 year ago

Disable Query Expansion in settings to prevent related items from being included in the results

torvista commented 1 year ago

I have that disabled, but still the description results show:

marco-pm commented 1 year ago

That's because after searching into the model, the search looks for "tt" "hmt" and "11" (even partial matches) on the product name/description. This is expected. eg I can see in the image that "11" or "htm" is present in some of the titles. Or if not, in the description.

torvista commented 1 year ago

I understand that it is looking for these sub-strings....but why is that built-in/hard-coded? Is the idea to "fill up" the results to the limit when nothing matches the complete string?

marco-pm commented 1 year ago

I understand that it is looking for these sub-strings....but why is that built-in/hard-coded? Is the idea to "fill up" the results to the limit when nothing matches the complete string?

Correct... the idea is always to find something relevant with the query, up to the maximum number of results to display. It might seem stretched in this case (especially since it's a model number), but in most scenarios it makes sense to search for partial strings and so on.

torvista commented 1 year ago

Ok, so what confused me was seeing the partial results disappear and be replaced by "bad" results. Maybe the default field order should be changed as I indicated....at least it is noted here now!

torvista commented 1 year ago

in most scenarios it makes sense to search for partial strings and so on.

Hmmm, I'm not sure about that, as my colleague thought it was a fault: getting a "related" result from a model code that does not exist. I have the query expansion turned off but a search using eg: rg-rad0295, still uses the wildcard clause, when I would expect this to be removed:

MATCH(pd.products_name) AGAINST('rg rad0295' IN BOOLEAN MODE) +

If I remove that from the query code, I get no results, as expected.

marco-pm commented 1 year ago

in most scenarios it makes sense to search for partial strings and so on.

Hmmm, I'm not sure about that, as my colleague thought it was a fault: getting a "related" result from a model code that does not exist. I have the query expansion turned off but a search using eg: rg-rad0295, still uses the wildcard clause, when I would expect this to be removed:

MATCH(pd.products_name) AGAINST('rg rad0295' IN BOOLEAN MODE) +

If I remove that from the query code, I get no results, as expected.

The model (aka SKU or product code) search is a bit tricky.

One solution is to remove substring search entirely like you said. This also affects the search for other fields.

Or you should recognize that the user is specifically searching a model and only then removing the substring search. But how do you do that? Not so trivial, especially when it's a model that doesn't exist. One possible way I can think of is to check the user query against a "pattern" model. This is assuming that all models for the store can be described by one or more patterns.

However, considering that in general a user searches by name/description and not by model, and since each store has its own SKU system (it's not a standardized field), I think this is an optional customization that is up to the store owner/developer, using observers or directly modifying the code.

torvista commented 1 year ago

Even with the above mod to remove the boolean, I'm still finding this behaviour when typing in a model number.

HNW- returns matches with the search term highlighted on the model field, as expected. HNW-E - no matches, as expected. HNW-EVO - returns matches in name and description, but nothing highlighted. It is picking up EVO in the name/description. Not expected: I expect no results.

Only when I add double quotes to the search term in the MATCH clause to force an exact match does it return nothing, as expected. $sql = $db->bindVars($sql, ':searchQuery', '"' . $queryText . '"', 'string');

This I find confusing. Once there are no results, it should continue to return no results with a longer term. A search term is a filter, and once the results are zero, I think they should continue to be zero.

I've added code to generate a log to understand this better....

marco-pm commented 1 year ago

Be sure to comment out this line as well:

https://github.com/marco-pm/zencart_instantsearch/blob/b821ac3a96e4b37b65a2f3accb84ed19a6764ebf/zc_plugins/InstantSearch/v4.0.2/classes/SearchEngineProviders/MysqlSearchEngineProvider.php#L158

marco-pm / zencart_instantsearch

"unwanted" results #27