Closed it-is-hacker-time closed 6 years ago
I recommend using the cosine similarity algorithm.
$text = []
$text[] = tokenize("I would like a cheese pizza");
$text[] = tokenize("I would like a cheese pizza with onions");
$text[] = tokenize("I would like a cheese pizza without onions");
$compareAgainst = tokenize("I would like a ceese pizza with out onnions.")
$bestScore = 0;
$bestIdx = 0;
$compare = new CosineSimilarityComparison();
foreach($text as $index => $t)
{
$score = $compare->similarity($t, $compareAgainst);
if($score > $best) {
$best = $score;
$bestIdx = $index;
}
}
echo "best match {$text[$bestIdx]}";
The same code with some corrections:
`require_once('vendor/autoload.php');
use TextAnalysis\Comparisons\CosineSimilarityComparison;
$text = []; $text[]= "I would like a cheese pizza"; $text[] = "I would like a cheese pizza with onions"; $text[] = "I would like a cheese pizza without onions";
$compareAgainst = tokenize("I would like a ceese pizza with out onnions.");
//$bestScore = 0;
$best = 0;
$bestIdx = 0;
$compare = new CosineSimilarityComparison();
foreach($text as $index => $t)
{
$t=tokenize($t);
$score = $compare->similarity($t, $compareAgainst);
if($score > $best) {
$best = $score;
$bestIdx = $index;
}
}
echo "best match {$text[$bestIdx]}"; `
What algoritm should I use to find the closest match from a string to a set of strings.
Example of known inputs:
Input I wanna match up and find most similiar, in case there are any similar (in this example there are just spelling mistakes):