logannc / fuzzywuzzy-rs

port of https://github.com/seatgeek/fuzzywuzzy
GNU General Public License v2.0
40 stars 3 forks source link

fuzzywuzzy-rs

docs.rs badge crates.io badge

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

This is a Rust port of the Python package fuzzywuzzy. We aim to be a drop-in replacement for the original.

At the time of writing, our matching algorithm is based on the difflib implementation results which may, in rare cases, have slightly different results compared to the Python Levenshtein implementation.

NOTE: This project was originally named fuzzyrusty, but someone else cloned and published it to crates.io. We have since recovered the crate, but we had renamed this crate to differentiate.**

Installation

fuzzywuzzy is currently available through GitHub or crates.io.

For the latest stable release, add this to your Cargo.toml:

[dependencies]
fuzzywuzzy = "*"

For the bleeding edge, you can pull directly from master:

[dependencies]
fuzzywuzzy = { git = "https://github.com/logannc/fuzzywuzzy-rs", branch = "master" }

Documentation

Clone the repository and run $ cargo doc --open, or visit docs.rs.

Usage

Simple Ratio

assert_eq!(fuzz::ratio("this is a test", "this is a test!"), 97);

Partial Ratio

assert_eq!(fuzz::partial_ratio("this is a test", "this is a test!"), 100);

Token Sort Ratio

assert_eq!(fuzz::ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear"), 91);
assert_eq!(fuzz::token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear", true, true), 100);

Token Set Ratio

assert_eq!(fuzz::ratio("fuzzy was a bear", "fuzzy fuzzy was a bear"), 84);
assert_eq!(fuzz::token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear", true, true), 100);

Process

assert_eq!(process::extract_one(
  "cowboys",
 &["Atlanta Falcons", "Dallas Cowboys", "New York Jets"],
 &utils::full_process,
 &fuzz::wratio,
  0,
), Some(("Dallas Cowboys".to_string(), 90)));