vickumar1981 / stringdistance

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
https://vickumar1981.github.io/stringdistance/api/com/github/vickumar1981/stringdistance/index.html
Other
78 stars 15 forks source link

JMH benchmarks #61

Closed JD557 closed 4 years ago

JD557 commented 4 years ago

As a follow-up to https://github.com/vickumar1981/stringdistance/pull/58#pullrequestreview-517270531, this PR adds some JMH benchmarks to the ArrayDistance methods.

Here are the results of running jmh:run -i 1 -wi 1 -f1 -t1 on my 2015 MacBook Pro.

Benchmark                                                 Mode  Cnt          Score   Error  Units
ArrayDistanceBenchmarks.largeDiffCosineTest              thrpt          441039.885          ops/s
ArrayDistanceBenchmarks.largeDiffDamerauTest             thrpt           18397.840          ops/s
ArrayDistanceBenchmarks.largeDiffDiceCoefficientTest     thrpt          613988.840          ops/s
ArrayDistanceBenchmarks.largeDiffHammingTest             thrpt         1104132.741          ops/s
ArrayDistanceBenchmarks.largeDiffJaccardTest             thrpt           30540.841          ops/s
ArrayDistanceBenchmarks.largeDiffJaroTest                thrpt            6444.788          ops/s
ArrayDistanceBenchmarks.largeDiffJaroWinklerTest         thrpt            6424.645          ops/s
ArrayDistanceBenchmarks.largeDiffLevenshteinTest         thrpt           11487.328          ops/s
ArrayDistanceBenchmarks.largeDiffNGramDistTest           thrpt           30818.115          ops/s
ArrayDistanceBenchmarks.largeDiffNGramScoreTest          thrpt           30233.446          ops/s
ArrayDistanceBenchmarks.largeDiffNeedlemanWunschTest     thrpt           17569.708          ops/s
ArrayDistanceBenchmarks.largeDiffOverlapTest             thrpt           30068.981          ops/s
ArrayDistanceBenchmarks.largeDiffSmithWatermanGotohTest  thrpt           17950.350          ops/s
ArrayDistanceBenchmarks.largeDiffSmithWatermanTest       thrpt            1425.146          ops/s
ArrayDistanceBenchmarks.largeDiffTverskyTest             thrpt           13631.024          ops/s
ArrayDistanceBenchmarks.largeSameCosineTest              thrpt          469210.378          ops/s
ArrayDistanceBenchmarks.largeSameDamerauTest             thrpt           18371.691          ops/s
ArrayDistanceBenchmarks.largeSameDiceCoefficientTest     thrpt          607873.432          ops/s
ArrayDistanceBenchmarks.largeSameHammingTest             thrpt         1197847.588          ops/s
ArrayDistanceBenchmarks.largeSameJaccardTest             thrpt           82927.656          ops/s
ArrayDistanceBenchmarks.largeSameJaroTest                thrpt           99501.217          ops/s
ArrayDistanceBenchmarks.largeSameJaroWinklerTest         thrpt           92704.896          ops/s
ArrayDistanceBenchmarks.largeSameLevenshteinTest         thrpt           12271.563          ops/s
ArrayDistanceBenchmarks.largeSameNGramDistTest           thrpt           82664.286          ops/s
ArrayDistanceBenchmarks.largeSameNGramScoreTest          thrpt           83175.825          ops/s
ArrayDistanceBenchmarks.largeSameNeedlemanWunschTest     thrpt         1757868.805          ops/s
ArrayDistanceBenchmarks.largeSameOverlapTest             thrpt           82676.886          ops/s
ArrayDistanceBenchmarks.largeSameSmithWatermanGotohTest  thrpt           19145.435          ops/s
ArrayDistanceBenchmarks.largeSameSmithWatermanTest       thrpt            1489.250          ops/s
ArrayDistanceBenchmarks.largeSameTverskyTest             thrpt           81298.891          ops/s
ArrayDistanceBenchmarks.smallDiffCosineTest              thrpt         1048530.501          ops/s
ArrayDistanceBenchmarks.smallDiffDamerauTest             thrpt          393984.004          ops/s
ArrayDistanceBenchmarks.smallDiffDiceCoefficientTest     thrpt         1788681.339          ops/s
ArrayDistanceBenchmarks.smallDiffHammingTest             thrpt         4260911.428          ops/s
ArrayDistanceBenchmarks.smallDiffJaccardTest             thrpt          276966.223          ops/s
ArrayDistanceBenchmarks.smallDiffJaroTest                thrpt          403614.692          ops/s
ArrayDistanceBenchmarks.smallDiffJaroWinklerTest         thrpt          376493.104          ops/s
ArrayDistanceBenchmarks.smallDiffLevenshteinTest         thrpt          263856.976          ops/s
ArrayDistanceBenchmarks.smallDiffLongestCommonSeqTest    thrpt             432.280          ops/s
ArrayDistanceBenchmarks.smallDiffNGramDistTest           thrpt          283573.382          ops/s
ArrayDistanceBenchmarks.smallDiffNGramScoreTest          thrpt          282893.455          ops/s
ArrayDistanceBenchmarks.smallDiffNeedlemanWunschTest     thrpt          340344.335          ops/s
ArrayDistanceBenchmarks.smallDiffOverlapTest             thrpt          277022.468          ops/s
ArrayDistanceBenchmarks.smallDiffSmithWatermanGotohTest  thrpt          385030.029          ops/s
ArrayDistanceBenchmarks.smallDiffSmithWatermanTest       thrpt          156275.448          ops/s
ArrayDistanceBenchmarks.smallDiffTverskyTest             thrpt          211475.612          ops/s
ArrayDistanceBenchmarks.smallSameCosineTest              thrpt         1129606.449          ops/s
ArrayDistanceBenchmarks.smallSameDamerauTest             thrpt          375486.316          ops/s
ArrayDistanceBenchmarks.smallSameDiceCoefficientTest     thrpt         1147438.239          ops/s
ArrayDistanceBenchmarks.smallSameHammingTest             thrpt         4079188.236          ops/s
ArrayDistanceBenchmarks.smallSameJaccardTest             thrpt         1390518.064          ops/s
ArrayDistanceBenchmarks.smallSameJaroTest                thrpt          551722.743          ops/s
ArrayDistanceBenchmarks.smallSameJaroWinklerTest         thrpt          589250.679          ops/s
ArrayDistanceBenchmarks.smallSameLevenshteinTest         thrpt          246561.757          ops/s
ArrayDistanceBenchmarks.smallSameLongestCommonSeqTest    thrpt         9078936.082          ops/s
ArrayDistanceBenchmarks.smallSameNGramDistTest           thrpt         1428757.599          ops/s
ArrayDistanceBenchmarks.smallSameNGramScoreTest          thrpt         1411156.069          ops/s
ArrayDistanceBenchmarks.smallSameNeedlemanWunschTest     thrpt         6126023.083          ops/s
ArrayDistanceBenchmarks.smallSameOverlapTest             thrpt         1401205.916          ops/s
ArrayDistanceBenchmarks.smallSameSmithWatermanGotohTest  thrpt          381707.698          ops/s
ArrayDistanceBenchmarks.smallSameSmithWatermanTest       thrpt          161300.496          ops/s
ArrayDistanceBenchmarks.smallSameTverskyTest             thrpt         1255910.582          ops/s

I had to disable the large*LongestCommonSeqTest, since those took a very long time to run.

coveralls commented 4 years ago

Pull Request Test Coverage Report for Build 236


Totals Coverage Status
Change from base Build 235: 0.0%
Covered Lines: 373
Relevant Lines: 383

💛 - Coveralls
vickumar1981 commented 4 years ago

@JD557 I think we should document how to run the benchmarks somewhere:

i.e, ./sbt bench/jmh:run -i 1 -wi 1 -f1 -t1, maybe the CONTRIBUTING.md? I'll update the ticket.

Thanks again. If you have any suggestions/more ideas, would love to add them to the issue.

vickumar1981 commented 4 years ago

Addresses https://github.com/vickumar1981/stringdistance/issues/59

vickumar1981 commented 4 years ago

Here's my results using the same parameters, on a Dell XPS 15, 16GB RAM, Intel® Core™ i7-8750H CPU @ 2.20GHz × 6


 Benchmark                                                Mode  Cnt         Score   Error  Units
 ArrayDistanceBenchmarks.largeDiffCosineTest              thrpt         463256.862          ops/s
 ArrayDistanceBenchmarks.largeDiffDamerauTest             thrpt          19882.380          ops/s
 ArrayDistanceBenchmarks.largeDiffDiceCoefficientTest     thrpt         631493.535          ops/s
 ArrayDistanceBenchmarks.largeDiffHammingTest             thrpt        1464930.412          ops/s
 ArrayDistanceBenchmarks.largeDiffJaccardTest             thrpt          35501.091          ops/s
 ArrayDistanceBenchmarks.largeDiffJaroTest                thrpt           8291.683          ops/s
 ArrayDistanceBenchmarks.largeDiffJaroWinklerTest         thrpt           8316.724          ops/s
 ArrayDistanceBenchmarks.largeDiffLevenshteinTest         thrpt          13307.179          ops/s
 ArrayDistanceBenchmarks.largeDiffNGramDistTest           thrpt          35880.455          ops/s
 ArrayDistanceBenchmarks.largeDiffNGramScoreTest          thrpt          31889.103          ops/s
 ArrayDistanceBenchmarks.largeDiffNeedlemanWunschTest     thrpt          16332.953          ops/s
 ArrayDistanceBenchmarks.largeDiffOverlapTest             thrpt          31283.130          ops/s
 ArrayDistanceBenchmarks.largeDiffSmithWatermanGotohTest  thrpt          16083.343          ops/s
 ArrayDistanceBenchmarks.largeDiffSmithWatermanTest       thrpt           1247.211          ops/s
 ArrayDistanceBenchmarks.largeDiffTverskyTest             thrpt          11251.254          ops/s
 ArrayDistanceBenchmarks.largeSameCosineTest              thrpt         407963.255          ops/s
 ArrayDistanceBenchmarks.largeSameDamerauTest             thrpt          13001.855          ops/s
 ArrayDistanceBenchmarks.largeSameDiceCoefficientTest     thrpt         482972.464          ops/s
 ArrayDistanceBenchmarks.largeSameHammingTest             thrpt        1058518.906          ops/s
 ArrayDistanceBenchmarks.largeSameJaccardTest             thrpt          77212.547          ops/s
 ArrayDistanceBenchmarks.largeSameJaroTest                thrpt          92520.326          ops/s
 ArrayDistanceBenchmarks.largeSameJaroWinklerTest         thrpt          84377.621          ops/s
 ArrayDistanceBenchmarks.largeSameLevenshteinTest         thrpt          10648.974          ops/s
 ArrayDistanceBenchmarks.largeSameNGramDistTest           thrpt          75980.752          ops/s
 ArrayDistanceBenchmarks.largeSameNGramScoreTest          thrpt          75287.870          ops/s
 ArrayDistanceBenchmarks.largeSameNeedlemanWunschTest     thrpt        1474620.965          ops/s
 ArrayDistanceBenchmarks.largeSameOverlapTest             thrpt          74760.483          ops/s
 ArrayDistanceBenchmarks.largeSameSmithWatermanGotohTest  thrpt          15637.325          ops/s
 ArrayDistanceBenchmarks.largeSameSmithWatermanTest       thrpt           1163.969          ops/s
 ArrayDistanceBenchmarks.largeSameTverskyTest             thrpt          74420.307          ops/s
 ArrayDistanceBenchmarks.smallDiffCosineTest              thrpt         881964.240          ops/s
 ArrayDistanceBenchmarks.smallDiffDamerauTest             thrpt         295043.787          ops/s
 ArrayDistanceBenchmarks.smallDiffDiceCoefficientTest     thrpt        1390611.681          ops/s
 ArrayDistanceBenchmarks.smallDiffHammingTest             thrpt        3175494.749          ops/s
 ArrayDistanceBenchmarks.smallDiffJaccardTest             thrpt         235655.354          ops/s
 ArrayDistanceBenchmarks.smallDiffJaroTest                thrpt         368428.064          ops/s
 ArrayDistanceBenchmarks.smallDiffJaroWinklerTest         thrpt         340532.643          ops/s
 ArrayDistanceBenchmarks.smallDiffLevenshteinTest         thrpt         218356.645          ops/s
 ArrayDistanceBenchmarks.smallDiffLongestCommonSeqTest    thrpt            351.577          ops/s
 ArrayDistanceBenchmarks.smallDiffNGramDistTest           thrpt         243391.176          ops/s
 ArrayDistanceBenchmarks.smallDiffNGramScoreTest          thrpt         248387.995          ops/s
 ArrayDistanceBenchmarks.smallDiffNeedlemanWunschTest     thrpt         348656.779          ops/s
 ArrayDistanceBenchmarks.smallDiffOverlapTest             thrpt         244052.050          ops/s
 ArrayDistanceBenchmarks.smallDiffSmithWatermanGotohTest  thrpt         334112.438          ops/s
 ArrayDistanceBenchmarks.smallDiffSmithWatermanTest       thrpt         129757.925          ops/s
 ArrayDistanceBenchmarks.smallDiffTverskyTest             thrpt         183541.753          ops/s
 ArrayDistanceBenchmarks.smallSameCosineTest              thrpt         994661.654          ops/s
 ArrayDistanceBenchmarks.smallSameDamerauTest             thrpt         276295.859          ops/s
 ArrayDistanceBenchmarks.smallSameDiceCoefficientTest     thrpt        1004635.635          ops/s
 ArrayDistanceBenchmarks.smallSameHammingTest             thrpt        3537342.102          ops/s
 ArrayDistanceBenchmarks.smallSameJaccardTest             thrpt        1280571.412          ops/s
 ArrayDistanceBenchmarks.smallSameJaroTest                thrpt         506491.705          ops/s
 ArrayDistanceBenchmarks.smallSameJaroWinklerTest         thrpt         510600.949          ops/s
 ArrayDistanceBenchmarks.smallSameLevenshteinTest         thrpt         220997.361          ops/s
 ArrayDistanceBenchmarks.smallSameLongestCommonSeqTest    thrpt        7151477.247          ops/s
 ArrayDistanceBenchmarks.smallSameNGramDistTest           thrpt        1327008.651          ops/s
 ArrayDistanceBenchmarks.smallSameNGramScoreTest          thrpt        1302470.502          ops/s
 ArrayDistanceBenchmarks.smallSameNeedlemanWunschTest     thrpt       13202646.597          ops/s
 ArrayDistanceBenchmarks.smallSameOverlapTest             thrpt        1250181.714          ops/s
 ArrayDistanceBenchmarks.smallSameSmithWatermanGotohTest  thrpt         329937.512          ops/s
 ArrayDistanceBenchmarks.smallSameSmithWatermanTest       thrpt         128986.998          ops/s
 ArrayDistanceBenchmarks.smallSameTverskyTest             thrpt        1124017.169          ops/s
[success] Total time: 1263 s (21:03), completed Nov 1, 2020 2:34:05 PM