universal-automata / liblevenshtein

Various utilities regarding Levenshtein transducers.
https://github.com/universal-automata/liblevenshtein
MIT License
67 stars 13 forks source link

Port the transducer to multiple languages #9

Open dylon opened 10 years ago

dylon commented 10 years ago

Target languages include:

... others?

fgregg commented 9 years ago

I'm very interested in seeing a python port so I can use it in my dedupe library: https://github.com/datamade/dedupe/pull/352#issuecomment-73629312

Right now, I've been playing around with the moman library, but it's much too slow to index terms and to search.

How can I help?

dylon commented 9 years ago

Hi @fgregg,

I've been focusing on other projects lately but I'll add you to the organization if you'd like to begin a Python port. It doesn't have to adhere to my original architecture as I'm in the middle of rearchitecting it anyway, and it would be good to get more heads into the design process. Porting it from CoffeeScript to Python would be a mostly-straight forward task.

Since my development is primarily in Java right now, I wanted to finish porting it to Java before moving to other languages. That's turned into more of a research project, though, as I've been adding a lot of additional features to the original design (suffix tree searching, double array trie compression, etc.), so I may go ahead and port it directly to Python.