Open stenglein-lab opened 6 years ago
By aping string-similarity
with a different module fast-levenshtein
, we get the right answer for Okola_sample <=> Okala_sample, but lose the connection between NEPV and Nepuyo_sample. fast-levenshtein
does appear to be slightly faster than string-similarity
. Considering switching.
There seems to be disagreement between leaves of the trees Orthobunyavirus_M and Orthobunyavirus_L. Using cophy-treetools/bin/test_leaf_lookups.js, we get: M versus L:
... whereas L versus M gives:
Most replacements are misspellings or variations, but there are some cases where the best match isn't the same in both directions. Comparing the M tree to the L tree,
Nepuyo_sample
matches closest toAino_sample
. However, the reciprocal comparison omits this warning, suggesting thatAino_sample
indeed exists in both the M tree and L trees, butNepuyo_sample
is just matching the closest string erroneously.Likewise, there are discrepancies going the L to M direction.
Okola_sample
should matchOkala_sample
in M, since they are one character different. Nola_sample --> Okola_sample is edit distance 2, right? Check the outcome ofstringsimilarity.bestMatch
.Also, in L there is
NEPV
matching toNepuyo_sample
in M.This condition will force any leaf without an appropriate match to attach to the most similarly named leaf, even if it's already matched. Reciprocal-best will eliminate this case.