Closed sasi143 closed 4 years ago
@taleinat thanks for your reply. when I am trying to check a similar match of my string with a given sentence, it is giving multiple results with max_l_dist.
Here is the sample output I am getting:
[Match(start=26, end=32, dist=1, matched='4480-5')]
[Match(start=24, end=29, dist=1, matched='22448')]
[Match(start=26, end=30, dist=1, matched='4480')]
[Match(start=25, end=29, dist=1, matched='2448')]
[Match(start=26, end=30, dist=1, matched='4480')]
[Match(start=24, end=29, dist=1, matched='22448')]
[Match(start=25, end=29, dist=1, matched='2448')]
[Match(start=24, end=28, dist=1, matched='2244')]
[Match(start=24, end=28, dist=1, matched='2244')]
[Match(start=26, end=31, dist=1, matched='4480-')]
[Match(start=25, end=29, dist=1, matched='2448')]
[Match(start=26, end=30, dist=1, matched='4480')]
[Match(start=26, end=30, dist=1, matched='4480')]
[Match(start=24, end=29, dist=1, matched='22448')]
[Match(start=25, end=29, dist=1, matched='2448')]
[Match(start=24, end=32, dist=1, matched='224480-5')]
[Match(start=24, end=29, dist=1, matched='22448')]
[Match(start=25, end=29, dist=1, matched='2448')]
my expected output
[Match(start=24, end=32, dist=1, matched='224480-5')]
My questions:
Hi @sasi143,
I am still unsure about how you are receiving such output. fuzzysearch has special code to avoid returning such overlapping results. Also, a single call to find_near_matches() will return a single list of results, but the output you've supplied includes multiple lists (each containing a single Match
object).
Are you calling find_near_matches() multiple times, perhaps in a loop? Could you post the piece of code that generated this output?
@taleinat , yes you are correct. I am looping the find_near_matches() function.
Here is my code
from fuzzysearch import find_near_matches
for i in item_number: score = find_near_matches(i, "chemicals nitrogen code-224480-5g", max_l_dist=1) if len(score) != 0: print(score)
@taleinat The results are varying by changing max_l_dist value, But not sure what will be the perfect value to pass, could you please help me on this
@sasi143, for a single search, if you call find_near_matches() with a high value for max_l_dist
, it will return all potential matches and you can choose the one with the lowest distance (dist
) as the best match.
In your case, you're running multiple fuzzy searches and appear to want to choose the best result. Have you tried something like this?
results = [
find_near_matches(i, "chemicals nitrogen code-224480-5g", max_l_dist=1)
for i in item_number
]
# Select the result with the lowest Levenshtein distance,
# and of those the one with the longest matched string.
best_result = max(results, key=lambda match: (-match.dist, len(match.matched))
This is a rather general programming question not directly related to fuzzysearch, and not relevant as a bug or enhancement suggestion, so I'm closing this issue.
In the future, I highly recommend getting programming help in more appropriate forums, such as the Stack Overflow Q&A website, the #python IRC channel or the python-tutor mailing list.
@taleinat Thank you very much for your time
Hi @sasi143, I will need more information to understand your question before I can help.
An example showing what you are trying to do would be the best.