Open DarioS opened 10 months ago
Probably should say "See ?buildKmknn
for details", as it just forwards its arguments to that. Happy to take a PR for the doc fix (might consider doing it for all constructors). I couldn't be bothered repeating the whole vector of options in the KmknnParam
constructor, then I would have to update things in two places if I ever added a new distance metric.
It seems that all methods are restrained to distance = c("Euclidean", "Manhattan", "Cosine")
(see ?bndistance
). I suppose we could also have method-specific distances, for those that support them, e.g., Hamming for Annoy. I haven't come across a demand for that yet, though again, PRs are welcome if someone can be bothered.
Note that my "Cosine" is really just L2-normalized euclidean because the typical "1 - cosine similarity" isn't a real distance metric. I mention this because, as you may already know, "1 - correlation" isn't a distance metric either. Instead I typically center and scaled the input values for each sample, and then use the typical Euclidean distance, which turns out to be a monotonic function of the Pearson correlation (sqrt((1 - correlation)/2)
, or something like that, see here).
O.K. but what are permitted values? Perhaps
KmknnParam
should havedistance = c('A', 'B', 'C')
and inside usematch.arg
. I am keen to try Louvain clustering with a Pearson correlation distance, for instance.