koheiw / proxyC

R package for large-scale similarity/distance computation
GNU General Public License v3.0
29 stars 6 forks source link

References for methods #34

Closed rcannood closed 10 months ago

rcannood commented 2 years ago

I really like that the documentation now has a bit of information on what the different dists and simils are.

It reminds me a bit of the information you can extract from proxy::pr_DB$get_entries(). For example, for the Kullback-Leibler:

$Kullback
      names Kullback, Leibler
        FUN pr_KullbackLeibler
   distance TRUE
     PREFUN NA
    POSTFUN NA
    convert pr_dist2simil
       type metric
       loop TRUE
      C_FUN FALSE
    PACKAGE proxy
       abcd FALSE
    formula sum_i [x_i * log((x_i / sum_j x_j) / (y_i / sum_j y_j)) /
            sum_j x_j)]
  reference Kullback S., and Leibler, R.A. (1951). On information and
            sufficiency. The Annals of Mathematical Statistics, vol.
            22, pp. 79--86
description The Kullback-Leibler-distance.

I think it'd be relatively easy to add the references to the documentation as well.

Should this simply be added to the description, or should we create two large data frames with all of the metadata?

koheiw commented 2 years ago

I agree that it is nice to have formulas to explain how scores are calculated. Let's add them in itemized list in the MAN page. We can separate pages for simil and dist if it becomes too long.

koheiw commented 2 years ago

@rcannood , I wrote the first draft on the equations. Since there are many, I wrote them in a RMD and generated a HTML.

https://github.com/koheiw/proxyC/tree/man-equation/misc

Do you think they fit well in the MAN page, a vigniette, or a pkgdown site? I welcome your comments on the notation too.

rcannood commented 1 year ago

Thanks for your quick response!

It might be too big for a man page. Maybe a vignette would be a better fit?

Setting up a pkgdown would also be a good idea!