manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
9.11k stars 509 forks source link

Add warning on CALL SUGGEST with morphology=smth #2320

Closed donhardman closed 5 months ago

donhardman commented 5 months ago

Bug Description:

We utilize bigrams or trigrams in our CALL SUGGEST feature, which could potentially be the root cause of the problem we're observing. When the lemmatizer is enabled, it appears to be producing some unusual or incorrect word recommendations, as shown in the table below:

| flase | 4 | 1 | | pease | 4 | 1 | | nurse | 4 | 1 | | jwise | 4 | 1 | | nouse | 4 | 1 | | rense | 4 | 1 | | onuse | 4 | 1 | | dbase | 4 | 1 | | hpuse | 4 | 1 | | dxuse | 4 | 1 | | argse | 4 | 1 | | twise | 4 | 1 | +---------+----------+-------+

We should consider adding a warning or notification to alert users about this potential issue. It's likely that the lemmatizer is unable to distinguish between lemmas and regular words when CALL SUGGEST retrieves suggestions from the dictionary. However, we may be able to differentiate them based on the equal sign (=) or other indicators.

Manticore Search Version:

Latest dev version

Operating System Version:

Ubuntu Jammy

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

- [ ] Implementation completed - [ ] Tests developed - [ ] Documentation updated - [ ] Documentation reviewed - [ ] Changelog updated
manticoresearch commented 5 months ago

Pls provide the full query.

sanikolaev commented 5 months ago

As discussed, there's probably no problem here.

donhardman commented 5 months ago

As discussed and validated earlier, we concluded that this is not an issue. It was a misunderstanding of the outputs, which appeared incorrect due to user errors. However, the outputs are actually correct. We are closing this issue.