navinlabcode / copykat

Other
203 stars 54 forks source link

copykat issue: error in step 4 #84

Open YubinXie opened 1 year ago

YubinXie commented 1 year ago

I have tried running with multiple sessions with my 70k cell data and I constantly get this error:

[1] "running copykat v1.1.0" [1] "step1: read and filter data ..." [1] "22318 genes, 73901 cells in raw data" [1] "8197 genes past LOW.DR filtering" [1] "step 2: annotations gene coordinates ..." [1] "start annotation ..." [1] "step 3: smoothing data with dlm ..." [1] "step 4: measuring baselines ..." [1] "23535 known normal cells found in dataset" [1] "run with known normal..." [1] "baseline is from known input" WARNING:rpy2.rinterface_lib.callbacks:R[write to console]: Error in hclust(d, method = "ward.D2") : size cannot be NA nor exceed 65536 Calls: -> -> copykat -> hclust

longyingda commented 1 year ago

copykat calls the hclust function and specifies the "ward.D" method for analysis. It is possible that you have too many genes, so you can filter for genes with less than 3 (or more) expressed cells to reduce the number of genes.

in copykat call hclust(): if(distance=="euclidean"){ ... hcc <- hclust(parallelDist::parDist(t(mat.adj),threads =n.cores, method = distance), method = "ward.D") }else { hcc <- hclust(as.dist(1-cor(mat.adj, method = distance)), method = "ward.D") ... }

but in hclust()

hclust function{ ... n <- as.integer(attr(d, "Size")) if(is.null(n)) stop("invalid dissimilarities") if(is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") ... }

YubinXie commented 1 year ago

copykat calls the hclust function and specifies the "ward.D" method for analysis. It is possible that you have too many genes, so you can filter for genes with less than 3 (or more) expressed cells to reduce the number of genes.

in copykat call hclust(): if(distance=="euclidean"){ ... hcc <- hclust(parallelDist::parDist(t(mat.adj),threads =n.cores, method = distance), method = "ward.D") }else { hcc <- hclust(as.dist(1-cor(mat.adj, method = distance)), method = "ward.D") ... }

but in hclust()

hclust function{ ... n <- as.integer(attr(d, "Size")) if(is.null(n)) stop("invalid dissimilarities") if(is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") ... }

Thanks for the reply. I figured it out. The clustering method limits the cell number below 65K. So one has to reduce the cell number.