paultpearson / TDAmapper

(R package) Analyze High-Dimensional Data Using Discrete Morse Theory
Other
73 stars 29 forks source link

cluster_cutoff_at_first_empty_bin "invalid number of breaks" error #4

Closed brianmunson closed 7 years ago

brianmunson commented 7 years ago

here is a simple reproducible example:

dist_mat <- matrix(rep(0.5,9), nrow=3) diag(dist_mat) <- 0 level_distance_matrix <- as.dist(dist_mat) clust <- hclust(level_distance_matrix, method="single") heights <- clust$height level_max_distance <- max(level_distance_matrix) num_bins_when_clustering <- 10 # any positive value will do cluster_cutoff_at_first_empty_bin(heights, level_max_distance, num_bins_when_clustering)

# Error in hist.default(c(heights, diam), breaks = bin_breaks, plot = FALSE) : invalid number of 'breaks'

details:

in the code for cluster_cutoff_at_first_empty_bin, with variables as above, the problem occurs here:

bin_breaks <- seq(from=min(heights), to=diam, by=(diam - min(heights))/num_bins_when_clustering) # bin_breaks will be the single number 0.5 which is less than 1. myhist <- hist(c(heights,diam), breaks=bin_breaks, plot=FALSE) # Error in hist.default(c(heights, diam), breaks = bin_breaks, plot = FALSE) : invalid number of 'breaks'

i think the issue is that if bin_breaks is a single number, then hist will interpret that as the number of cells, and if it is given a float it seems from experimentation (i didn't look at the hist code) that it rounds down. the way you have bin_breaks set up the only way you can get a single number as bin_breaks is if min(heights) == diam. and i believe the only way this can happen is in a situation like the one i described, where the interpoint distances are all equal. maybe one way to fix this is to simply set bin_breaks = 1 if length(bin_breaks) == 1.

paultpearson commented 7 years ago

I updated the code to deal with the special case length(bin_breaks) == 1.