paultpearson / TDAmapper

(R package) Analyze High-Dimensional Data Using Discrete Morse Theory
Other
73 stars 29 forks source link

Error in vertices_in_level[[k1]] : subscript out of bounds #3

Closed brianmunson closed 7 years ago

brianmunson commented 7 years ago

The mapper2D will occasionally fail with this error. From looking at the code I guess it is a failure of the loop which constructs the adjacency matrix. I cannot figure out why this is happening exactly. All of the tests I've done with various data sets and filters can all be forced to be successful if I lower the number of intervals enough. If I fix the intervals and change the filters, I can get it to complete for some filters but not for others. All of this makes me suspect the problem occurs when there are too many empty level sets.

Here is one way to produce the error for yourself: if you use the default settings for mapper2D but change the number of intervals from c(5,5) to c(11,12), for instance, it will fail and give the error. It will successfully complete its run if you use c(12,11) however. For the sake of trying something slightly different I tried the same thing by eliminating the 2* in front of the cosine term in the default setting for the sampled points and filter values. I had the same result: worked on c(12,11), failed for c(11,12)).

If I can figure out how to successfully use the debugger on your code I will look at it more closely.

paultpearson commented 7 years ago

Are you using the new version of TDAmapper available on github or the old version on CRAN? If you're using the old CRAN version, uninstall the CRAN version using remove.packages("TDAmapper") and then install the new github version using the instructions from the README.md file in the github repo for TDAmapper.

If the problem persists using the github version of TDAmapper, could you send me code and data that exhibit the bug so that I can reproduce the problem and try to fix it?

Thanks!

brianmunson commented 7 years ago

Paul- Installing the latest version via github directly seems to have solved the problem. My mistake; I didn't even know the CRAN and github versions were different (they are both called 1.0 when I run sessionInfo()). In any case: thanks for writing this code. It's been a lot of fun to play with. Best, Brian

On Tue, Oct 11, 2016 at 11:15 AM, Paul Pearson notifications@github.com wrote:

Are you using the new version of TDAmapper available on github or the old version on CRAN? If you're using the old CRAN version, uninstall the CRAN version using remove.packages("TDAmapper") and then install the new github version using the instructions from the README.md file in the github repo for TDAmapper.

If the problem persists using the github version of TDAmapper, could you send me code and data that exhibit the bug so that I can reproduce the problem and try to fix it?

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paultpearson/TDAmapper/issues/3#issuecomment-252999223, or mute the thread https://github.com/notifications/unsubscribe-auth/AVtM07cndXS_JB6IOaVDXqdUNuXLkL2Vks5qy9IogaJpZM4KTOne .

paultpearson commented 7 years ago

Honestly, if passing all the tests on CRAN were less time consuming, the version on CRAN would be up to date. I just don't have time right now to take on updating TDAmapper on CRAN. Enjoy! Paul

brianmunson commented 7 years ago

Hi Paul- We emailed a couple of months ago when I opened an issue on github regarding your TDAmapper project. I've since installed the latest version from github, but I've actually had the same error (Error in vertices_in_level[[j]] : subscript out of bounds) come up using mapper1D several times since then but was too busy to get around making sure it was reproducible. I just encountered it again this evening and figured I ought to let you know. So you can reproduce it, I've attached the .csv which I've read and also included the relevant R code and outputs from my machine.

If you'd rather I can open this issue on github once again but I figured it might be easier just to email you directly. I'm sorry about the file size: I wish I had a smaller data set on which the problem could be easily reproduced. For what it's worth, I will try to spend a little time tomorrow looking at the script for mapper1D, although I doubt I will find anything.

Best, Brian Munson

relevant code for reproducing this, cut and pasted from a fresh R session:

require(TDAmapper) Loading required package: TDAmapper sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.6 (El Capitan)

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods [7] base

other attached packages: [1] TDAmapper_1.0

loaded via a namespace (and not attached): [1] tools_3.3.1

devtools::install_github("paultpearson/TDAmapper") Skipping install of 'TDAmapper' from a github remote, the SHA1 (a7d80f99) has not changed since last install. Use force = TRUE to force installation data <- read.csv("~/Documents/R/Shiny/App-1/cc_samp_norm.csv") dist_obj <- dist(data) filter_obj <- data[,"V2"] mapper_obj <- mapper1D(dist_obj, filter_obj, 20, 50, 10) [1] "Level set has only one point" [1] "Level set is empty" [1] "Level set has only one point" [1] "Level set is empty" [1] "Level set is empty" Error in vertices_in_level[[j]] : subscript out of bounds

On Tue, Oct 11, 2016 at 1:08 PM, Paul Pearson notifications@github.com wrote:

Honestly, if passing all the tests on CRAN were less time consuming, the version on CRAN would be up to date. I just don't have time right now to take on updating TDAmapper on CRAN. Enjoy! Paul

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/paultpearson/TDAmapper/issues/3#issuecomment-253030613, or mute the thread https://github.com/notifications/unsubscribe-auth/AVtM0xNohC_8BH7bHOOjoexa-eg6wcU3ks5qy-zYgaJpZM4KTOne .

paultpearson commented 7 years ago

After the update to the file cluster_cutoff_at_first_empty_bin.R, I am unable to reproduce the error you report. Here's my code that works without reporting any errors.


title: "TDAmapper debugging" author: "Paul Pearson" date: "March 29, 2017" output: html_document

knitr::opts_chunk$set(echo = TRUE)

Debug

library(TDAmapper)

dist_mat <- matrix(rep(0.5,9), nrow=3)
diag(dist_mat) <- 0
level_distance_matrix <- as.dist(dist_mat)
clust <- hclust(level_distance_matrix, method="single")
heights <- clust$height
level_max_distance <- max(level_distance_matrix)
num_bins_when_clustering <- 10 # any positive value will do

cluster_cutoff_at_first_empty_bin <- function(heights, diam, num_bins_when_clustering) {

  # if there are only two points (one height value), then we have a single cluster
  if (length(heights) == 1) {
    if (heights == diam) {
      cutoff <- Inf
      return(cutoff)
    }
  }

  bin_breaks <- seq(from=min(heights), to=diam, 
                    by=(diam - min(heights))/num_bins_when_clustering)
  if (length(bin_breaks) == 1) { bin_breaks <- 1 }

  myhist <- hist(c(heights,diam), breaks=bin_breaks, plot=FALSE)
  z <- (myhist$counts == 0)
  if (sum(z) == 0) {
    cutoff <- Inf
    return(cutoff)
  } else {
    #  which returns the indices of the logical vector (z == TRUE), min gives the smallest index
    cutoff <- myhist$mids[ min(which(z == TRUE)) ]
    return(cutoff)
  }

}

cluster_cutoff_at_first_empty_bin(heights, level_max_distance, num_bins_when_clustering)

Test Mapper2D

library(igraph)

m2 <- mapper2D(
    distance_matrix = dist(data.frame( x=2*cos(1:100), y=sin(1:100) )),
    filter_values = list( 2*cos(1:100), sin(1:100) ),
    num_intervals = c(11,12),
    percent_overlap = 50,
    num_bins_when_clustering = 10)

g2 <- graph.adjacency(m2$adjacency, mode="undirected")
plot(g2, layout = layout.auto(g2) )