rapidfuzz / rapidfuzz-rs

Rapid fuzzy string matching in Rust using various string metrics
https://docs.rs/rapidfuzz/latest/rapidfuzz/
Apache License 2.0
31 stars 2 forks source link

Interest in a ruby implementation? #7

Open dgollahon opened 1 month ago

dgollahon commented 1 month ago

Hi,

I am interested in using rapidfuzz-rs through magnus in Ruby. I have no problem doing this for just myself (it's very straightforward), but I was wondering if it would make sense to opensource a project there for others. I am happy to release it under my own github or "donate" it to this organization if that is desirable/helpful. I don't want to squat the rapidfuzz gem name if this group/someone else would like to own it.

Thanks! Daniel

maxbachmann commented 1 month ago

I think placing it in the rapidfuzz organisation would make sense for people to find it more easily. In terms of gems it would probably make sense to use some trusted publishing system via github actions similar to what is done for the Python version of the library.

There are a couple of things that I did differently in the Python version compared to the C++/Rust version to make it more useful for Python users:

I never used ruby myself. So I can't help with any ruby specific questions, but I would be more than happy to help with any questions in regards to the library.

dgollahon commented 1 month ago

Ok, that makes sense.

I think native ruby fallback would probably be something I don't have time to implement but I think a relatively "dumb" port using the magnus tooling I mentioned above would not be heavy lift. I'm not sure exactly when I'll get to this but I will plan on putting up a draft repo at some point and possibly reserve the relevant gem name and then figure out publishing lifecycle later on.

I think the overhead for functions bound via magnus (indirectly the C APIs) should be reasonable for most use-cases. Using the osa_distance function i found some minor test workloads to be 5-150 times as fast as a similar C-based gem in the ecosystem.

maxbachmann commented 1 month ago

Yes I started out without all of these things in the Python version as well and added them as I had time + need for them.

Wrapping the API using something like magnus is probably not too much work, since most of the functions share a similar interface.