querqy / chorus

Towards an open source stack for e-commerce search
Apache License 2.0
141 stars 33 forks source link

[BUG] text_all field missing from index in chorus #129

Closed macohen closed 1 year ago

macohen commented 1 year ago

When going through the Katas after setting up the lab with ./quickstart.sh -with-offline-lab, adding the "Notebook Computers" case in quepid, and entering "notebook" and "laptop" into the query, the UI spins until a message appears at the bottom:

"One or more of your Solr queries failed to return results, please access your Solr instance directly to confirm Solr is accessible and to inspect the error. If Solr responds, check if you have an ad blocker blocking your queries. With Solr 8.4.1 and later you need to allow Quepid access to Solr. Learn more on the troubleshooting Solr wiki page."

The query responds with:

"responseHeader": {
"zkConnected": true,
"status": 400,
"QTime": 0,
"params": {
"df": "id",
"debug": "true",
"hl": "false",
"echoParams": [
"all",
"all)"
],
"indent": "true",
"fl": "id title img_500x500 name brand product_type",
"rows": "10",
"debug.explain.structured": "true",
"q": "laptop",
"tie": "1.0",
"defType": "edismax",
"qf": "text_all",
"wt": "json",
"rid": "-148"
}
},
"error": {
"metadata": [
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"org.apache.solr.common.SolrException"
],
"msg": "org.apache.solr.search.SyntaxError: Query Field 'text_all' is not a valid field name",
"code": 400
}
}

The text_all field used in the query filter is missing from the index.

I'm not quite sure where the query is generated in the code and was just going to add a field to the index to make it work. If I can get pointed in the right direction and this is not a config issue on my end, I'd be happy to investigate a possible fix. If it is something I missed in the kata or something missing in the kata, I could take a look there too.

epugh commented 1 year ago

I'm doing a bit of digging to see if I can reproduce.... "text_all" is a very common field when you use the classic tech_products schema as your starting point! However, we have an optimized schema... Going to see if maybe Quepid by default assumes text_all exists...

risdenk commented 1 year ago

Quepid defaults qf to text_all it looks like: https://github.com/o19s/quepid/blame/main/app/assets/javascripts/services/settingsSvc.js#L27

epugh commented 1 year ago

Well, this is interesting....

https://github.com/o19s/quepid/blob/c538c0ccfd95124d159c3b12e8e80bf2cd2c6d35/app/assets/javascripts/services/settingsSvc.js#L27

We may need to rethink what our defaults in Quepid are! It doesn't help that in Quepid we communicate with Solr via JSONP, which means if we don't get a response, it just spins, something we don't have with OpenSearch which uses a proper CORS request.

renekrie commented 1 year ago

To be honest, I'd prefer not to have a text_all field in an e-commerce schema. It could be a solution in the exceptional case that the data is messy and we can't find a better structure. We shouldn't treat text from fields that must not be stemmed (= names) the same like text that should be stemmed by merging them into a single field.

epugh commented 1 year ago

so, the use that Quepid is using is to look up fields to drive the UI, and that honestly should probably be a requiest to /luke and get back the list of defined fields, not a query and then just grab back unique fields in the first X results..

epugh commented 1 year ago

Okay, went through the process, and it turns out that we do a basic q=*:* to populate the list of fields in the case wizard process... The use of qf=text_all is the default configuration from the wizard, and does assume you are doing TMDB. I'm going to open a new ticket in Quepid, we need to rethink our wizard, where we have a set of default settings for a search engine that are optimized for the TMDB demo, and we need a separate set of default settings for a non TMDB example (like in this case)! Thanks @macohen for discovering this.

macohen commented 1 year ago

Is there anything I can do to workaround this? One of the things we want to do is put together an installation of Chorus for us to investigate, pick apart, and understand from a UX perspective. We're interested in seeing how it might all fit together; especially Quepid, and SMUI since we already have a plugin for Querqy now.

renekrie commented 1 year ago

@epugh How about we switch to using title as the default field in Quepid as a quick-fix. I think that this would be a very common field in many schemas and it doesn't make an assumption about using a 'catch all' pattern. It also exists in the Chorus schema but I'm not sure how much data it has. We should still fix the Quepid wizzard and allow to set qf there.

@macohen Can you get past the wizard and edit qf under 'Tune relevance`?

macohen commented 1 year ago

@renekrie editing the qf in Tune Relevance worked. I'll bring it up locally and see if we come across anything else. Thanks!

epugh commented 1 year ago

A quick update that I'm hoping to finish reworking how we pick the default case settings in Quepid to be smarter about if you are using our demo TMDB dataset or not. This will go out hopefully before thanksgiving and be a nicer fix. I'll update Chorus to the latest Quepid, which also introduces shipping Jupyter notebooks for doing your own adhoc analysis on relevancy to the Quepid stack.

epugh commented 1 year ago

Rolling Quepid 6.14.0 release now... I have upgrade Chorus to that version, so give it a try!