teamtnt / tntsearch

A fully featured full text search engine written in PHP
https://tnt.studio/solving-the-search-problem-with-laravel-and-tntsearch
MIT License
3.09k stars 292 forks source link

Wrong search results in 1.3.5. Correct results in 1.0.7 #163

Closed sheriffmarley closed 6 years ago

sheriffmarley commented 6 years ago

I use the Laravel Scout TNTSearch Driver in combination with tntsearch.

I updated a customer project and tntsearch was also updated to the latest version 1.3.5.

On first look all was good but after a while I recognized that the search results are horrible inaccurate.

It's an movie database inspired by your TV Shows Search. There are more than 15k+ movies in it. Now if I searched for Matrix in 1.3.5 I never received any entry regarding those movies. Compared to 1.0.7 where i only got the 3 regarding entries.

I tried a lot other titles but rarely got the correct results with 1.3.5. I then downgraded back to 1.0.7 and got the estimated results. Why does this happen?

Do I have to update the search logic in the laravel project, or did I miss sth. important?

$movies = Movie::search($query)->get();
nticaric commented 6 years ago

Hmm, this is very strange. Do you maybe use fuzziness? This could be the only thing we maybe messed up

sheriffmarley commented 6 years ago

Settings:

'fuzziness' => true,,
        'fuzzy' => [
            'prefix_length' => 2,
            'max_expansions' => 50,
            'distance' => 2
        ],
        'asYouType' => true,
        'searchBoolean' => true

So yes fuzziness is used

nticaric commented 6 years ago

Have you tried to do a reindex?

sheriffmarley commented 6 years ago

Tried to do a reindex more than once

nticaric commented 6 years ago

Are you able to track the exact version that broke it? Going up from 1.0.7 to the current one

michaelklopf commented 6 years ago

Hi @nticaric,

due to updating TNT Search Driver and the update to the current version of this lib, I encounter an error when using fuzzy search, too.

I was able to track down when TNTSearch.php is returning - in my case - no results.

Something broke with the version of tag v1.3.4. TNTSearch.php in tag v1.3.3 still works.

Replacing

    public function fuzzySearch($keyword)
    {
        $prefix         = substr($keyword, 0, $this->fuzzy_prefix_length);
        $searchWordlist = "SELECT * FROM wordlist WHERE term like :keyword ORDER BY num_hits DESC LIMIT {$this->fuzzy_max_expansions}";
        $stmtWord       = $this->index->prepare($searchWordlist);
        $stmtWord->bindValue(':keyword', mb_strtolower($prefix)."%");
        $stmtWord->execute();
        $matches = $stmtWord->fetchAll(PDO::FETCH_ASSOC);

        $resultSet = [];
        foreach ($matches as $match) {
            if (levenshtein($match['term'], $keyword) <= $this->fuzzy_distance) {
                $resultSet[] = $match;
            }
        }
        return $resultSet;
    }

with the old version

    /**
     * @param $keyword
     *
     * @return array
     */
    public function fuzzySearch($keyword)
    {
        $prefix         = substr($keyword, 0, $this->fuzzy_prefix_length);
        $searchWordlist = "SELECT * FROM wordlist WHERE term like :keyword ORDER BY num_hits DESC LIMIT {$this->fuzzy_max_expansions}";
        $stmtWord       = $this->index->prepare($searchWordlist);
        $stmtWord->bindValue(':keyword', mb_strtolower($prefix)."%");
        $stmtWord->execute();
        $matches = $stmtWord->fetchAll(PDO::FETCH_ASSOC);

        $resultSet = [];
        foreach ($matches as $match) {
            $distance = levenshtein($match['term'], $keyword);
            if ($distance <= $this->fuzzy_distance) {
                $match['distance'] = $distance;
                $resultSet[]       = $match;
            }
        }

        // Sort the data by distance, and than by num_hits
        $distance = [];
        $hits     = [];
        foreach ($resultSet as $key => $row) {
            $distance[$key] = $row['distance'];
            $hits[$key]     = $row['num_hits'];
        }
        array_multisort($distance, SORT_ASC, $hits, SORT_DESC, $resultSet);

        return $resultSet;
    }

solves the problem as far as I can see.

nticaric commented 6 years ago

@michaelklopf ok, can you submit a PR so we can merge the fix?

michaelklopf commented 6 years ago

Do want to just set it back to the old state, or do you want to dig deeper why the new version is not working?

nticaric commented 6 years ago

It would certainly be better to dig a little bit deeper into this, but at the moment I don't have the time, so if you have the time and will, try to debug it

For a quick fix, we can just revert it

michaelklopf commented 6 years ago

Oh, the story does get better. I didn't check the next commits and you already reverted the part of the code in July. But a new error was introduced which I haven't found yet.

michaelklopf commented 6 years ago

@nticaric Found the problem and made a PR with a fix. See https://github.com/teamtnt/tntsearch/pull/168

Will you make a new release after that? And while you are at it, would you make a new release for the search driver too?

nticaric commented 6 years ago

Yap, a new release has been made, so closing this issue