koheiw / proxyC

R package for large-scale similarity/distance computation
GNU General Public License v3.0
29 stars 6 forks source link

"hamman" is wrong #26

Closed koheiw closed 2 years ago

koheiw commented 2 years ago

We have similarity method called "hamman" but it is wrong in two ways:

  1. It should be "hamann".
  2. It is computed in a wrong way.
double simil_hamman(colvec& col_i, colvec& col_j, double weight = 1) {
    double e = accu(col_i == col_j);
    double u = col_i.n_rows - e;
    return (e - (u * weight)) / (e + u);
}

I inherited the misspelling from the proxy package as below, but it is computing it correctly. It is the same as in a paper.

> proxy::pr_DB$get_entry("hamman")
      names Hamman
        FUN pr_Hamman
   distance FALSE
     PREFUN NA
    POSTFUN NA
    convert pr_simil2dist
       type binary
       loop TRUE
      C_FUN FALSE
    PACKAGE proxy
       abcd TRUE
    formula ([a + d] - [b + c]) / n
  reference Hamann, U. (1961). Merkmalbestand und Verwandtschaftsbeziehungen der Farinosae. Ein Beitrag zum System der
            Monokotyledonen. Willdenowia, 2, pp. 639-768.
description The Hamman Matching Similarity for binary data. It is the proportion difference of the concordant and
            discordant pairs.