yooper / php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
https://github.com/yooper/php-text-analysis/wiki
MIT License
527 stars 87 forks source link

Find most similar #34

Closed it-is-hacker-time closed 6 years ago

it-is-hacker-time commented 6 years ago

What algoritm should I use to find the closest match from a string to a set of strings.

Example of known inputs:

I would like a cheese pizza
I would like a cheese pizza with onions
I would like a cheese pizza without onions

Input I wanna match up and find most similiar, in case there are any similar (in this example there are just spelling mistakes):

I would like a ceese pizza with out onnions.
yooper commented 6 years ago

I recommend using the cosine similarity algorithm.

$text = []
$text[] = tokenize("I would like a cheese pizza");
$text[] = tokenize("I would like a cheese pizza with onions");
$text[] = tokenize("I would like a cheese pizza without onions");
$compareAgainst = tokenize("I would like a ceese pizza with out onnions.")
$bestScore = 0;
$bestIdx = 0;    
        $compare = new CosineSimilarityComparison();
        foreach($text as $index => $t)
        {
             $score = $compare->similarity($t, $compareAgainst);
             if($score > $best) {
                 $best = $score;
                 $bestIdx = $index;
            }
        }

echo "best match {$text[$bestIdx]}";
tematres commented 6 years ago

The same code with some corrections:

`require_once('vendor/autoload.php');

use TextAnalysis\Comparisons\CosineSimilarityComparison;

$text = []; $text[]= "I would like a cheese pizza"; $text[] = "I would like a cheese pizza with onions"; $text[] = "I would like a cheese pizza without onions";

$compareAgainst = tokenize("I would like a ceese pizza with out onnions.");

//$bestScore = 0; $best = 0; $bestIdx = 0;
$compare = new CosineSimilarityComparison();

    foreach($text as $index => $t)
    {
        $t=tokenize($t);
         $score = $compare->similarity($t, $compareAgainst);
         if($score > $best) {
             $best = $score;
             $bestIdx = $index;
        }

    }

echo "best match {$text[$bestIdx]}"; `