Closed tepperly closed 6 years ago
Sorry for the delayed response, and thanks for your pull request. Unfortunately, I will close this since codepoints conversion is no longer hardcoded, and uses MRI API instead. It also fixed #7.
Thank you for making jaro_winkler better 😃
I tried converting UTF-8 into uint16_t instead of unsigned long long. The wikipedia documentation on UTF-8 says that this should be valid. On my machine this makes the comparison faster.
The benchmarks prior to the change are
I expected it to be faster because it reduces the memory bandwidth requirements.