teamtnt / tntsearch

A fully featured full text search engine written in PHP
https://tnt.studio/solving-the-search-problem-with-laravel-and-tntsearch
MIT License
3.07k stars 292 forks source link

Inaccurate results when searching two or more keywords. #309

Open Sianature opened 7 months ago

Sianature commented 7 months ago

Hi all

I am using TNTsearch for mySQL database. When I search the index for two or more words, TNT gives me wrong matches with only one of the keywords in them and not both. Is there a way to fix this?

Here is how I index my database:

Indexing code require 'vendor/autoload.php'; use TeamTNT\TNTSearch\TNTSearch; $tnt = new TNTSearch; $tnt->loadConfig([ 'driver' => 'mysql', 'host' => 'localhost', 'database' => 'products2023', 'username' => 'root', 'password' => '', 'storage' => 'C:\xampp2023\htdocs\Text search engines\tntsearch\indexes', ]);

$indexer = $tnt->createIndex('products2023.index'); $indexer->query('SELECT p.id,d.volume,ing.chemicals,c.CO_NAME FROM products p LEFT JOIN dimensions d ON d.prod_id = p.id LEFT JOIN ingredients ing ON ing.prod_id = p.id LEFT JOIN company c ON c.CO_NR = p.CO_NR '); $indexer->run();

Searching code: require 'vendor/autoload.php'; use TeamTNT\TNTSearch\TNTSearch; include 'db_pdo_connect.php'; $tnt = new TNTSearch;

$tnt->loadConfig([ 'storage' => 'C:\xampp2023\htdocs\Text search engines\tntsearch\indexes', ]);

$tnt->selectIndex("products2023.index"); $res = $tnt->search("dupont pvc",10);

matches products that do not have 'dupont' in them

nticaric commented 7 months ago

In TNTsearch, when you use a phrase like this, it doesn't automatically imply an 'AND' operation between the words. This means the search doesn't guarantee results containing both 'dupont' and 'pvc'. Instead, TNTsearch utilizes the BM25 algorithm to determine the relevance of results. Based on this algorithm, it seems to prioritize the term 'pvc' as more relevant in the documents it returns.

Also, make sure that in the indexer query you have an id column returned

Sianature commented 7 months ago

Hi @nticaric

Thanks for your response. How should I enforce 'AND' operation to make sure both keywords exist in the matched results? For the id, do I need to return and id for each table or just one id (primary key) for the first table?

Thnx

somegooser commented 7 months ago

Try to use searchBoolean($string) instead!

Sianature commented 7 months ago

Thanks, @somegooser . Using searchBoolean solved the problem. However, add fuzziness messed it up again meaning that non-relevant matches appeared.

somegooser commented 7 months ago

Try playing with parameters.. like enable 'asYouType' and see if the results get any better.

Sianature commented 7 months ago

@somegooser Tried that but had no luck! Here is my TNTSearch class parameters: class TNTSearch { public $config; public $asYouType = true; public $maxDocs = 500; public $tokenizer = null; public $index = null; public $stemmer = null; public $fuzziness = true; public $fuzzy_prefix_length = 2; public $fuzzy_max_expansions = 500; public $fuzzy_distance = 3; protected $dbh = null;

nticaric commented 7 months ago

Can you provide us with an sample of your dataset. From the first post, it seems you are joining other tables in the process of index building and you don't specify how you retrieve the documents after the search returns the results. Usually the ->search() method is enough and the searched query is among the first 5 results