Closed dheerajck closed 1 year ago
Dont you think that the parameter name processor can be confusing, and something like string_preprocessor would be a better name ??
Dont you think that the parameter name processor can be confusing, and something like string_preprocessor would be a better name ??
I agree it is not a perfect name. The naming stems from fuzzywuzzy
using the named argument processor
in their process.*
APIs. I added the argument to every scorer, which in hindsight wasn't a great idea. It saves the user very little typing:
Levenshtein.distance(s1, s2, processor=utils.default_process)
vs
Levenshtein.distance(utils.default_process(s1), utils.default_process(s2))
in addition the performance difference is pretty small. For short sequences <16 characters the second implementation appears a couple percent faster and for longer ones calling it internally appears to be around 10% faster. So it only makes a difference when working with very fast scorers like Prefix/Postfix/Hamming and long sequences. Even then when comparing multiple sequences your better off using the scorer with the process.*
APIs.
For the process.*
APIs that is a different story, since:
1) it saves more typing
2) I am able to call the preprocessing function in a more performant way
For these reasons I was actually playing with the thought of deprecating the processor
argument in scorers.
Updated readme
Added examples of WRatio, QRatio and updated score values Added examples of string preprocessing