kharchenkolab / gpsFISH

Optimization of gene panels for targeted spatial transcriptomics
Other
7 stars 1 forks source link

Problem with gpsFISH_optimize() #13

Closed karJac closed 1 year ago

karJac commented 1 year ago

Hi,

I was following your tutorial on gene panel selection, when I ran into an error while executing the gpsFISH_optimize function. I tried debugging it, but I couldn't fix it. I would really appreciate if you found a moment to look at this. The problem doesn't occure when tutorial code is being run on tutorial data.

On dropbox I have uploaded the raw files for the analysis and the files that go directly into gpsFISH_optimize function. https://www.dropbox.com/scl/fo/hm26kgcocba2sfzx6ufx6/h?dl=0&rlkey=66du8r6xokvizaxs2hu1fl5ql

> GA = gpsFISH_optimize(earlyterm = 10, converge.cutoff = 0.01, n = dim(sc_count)[1], k = panel_size, ngen = 10, popsize = pop_size, verbose = 1, cluster = 1, initpop = initpop, method = "NaiveBayes", metric = "Accuracy", nCV = 5, rate = 1, cluster_size_max = 50, cluster_size_min = 30, two_step_sampling_type = c("Subsampling_by_cluster", "Simulation"), simulation_model = "ZINB", sample_new_levels = "old_levels", use_average_cluster_profiles = FALSE, save.intermediate = FALSE, full_count_table = as.data.frame(t(sc_count)), cell_cluster_conversion = sc_cluster,
relative_prop = relative_prop, simulation_parameter = simulation_params, gene2include.id = gene2include.id, gene.weight = gene.weight, weight_penalty = weight_penalty ) Error in plogis(x) : Non-numeric argument to mathematical function

> traceback() 10: plogis(x) 9: boot::inv.logit(rep_col(gamma_i_gene, dim(sc_prop)[2]) * sqrt(sc_prop) + rep_col(c_i_gene, dim(sc_prop)[2])) 8: ZINB_predict(sc_prop = sc_prop, simulation_parameter = simulation_parameter, sample_new_levels = sample_new_levels, gene_list = gene_list, cell_list = cell_list, num_gene = num_gene, num_cell = num_cell) 7: simulation_ZINB(count_table = count_table, cell_cluster_conversion = cell_cluster_conversion, relative_prop = relative_prop, simulation_parameter = simulation_parameter, sample_new_levels = sample_new_levels, use_average_cluster_profiles = use_average_cluster_profiles) 6: sc2spatial(count_table = data4cv, cell_cluster_conversion = class_label_per_cell, simulation_type = simulation_type, simulation_parameter = simulation_parameter, simulation_model = simulation_model, relative_prop = relative_prop, sample_new_levels = sample_new_levels, use_average_cluster_profiles = use_average_cluster_profiles) 5: FUN(X[[i]], ...) 4: lapply(cvround, classifier_per_cv, cvlabel = cvlabel, data4cv = subsub_count_table, class_label_per_cell = class_label_per_cell, metric = metric, method = method, relative_prop = relative_prop, sample_new_levels = sample_new_levels, use_average_cluster_profiles = use_average_cluster_profiles, simulation_type = two_step_sampling_type[2], simulation_parameter = simulation_parameter, simulation_model = simulation_model, cell_cluster_conversion = cell_cluster_conversion, weight_penalty = weight_penalty) 3: OF(P[i, ], ...) 2: getfitness(pop) 1: gpsFISH_optimize(earlyterm = 10, converge.cutoff = 0.01, n = dim(sc_count)[1], k = panel_size, ngen = 10, popsize = pop_size, verbose = 1, cluster = 1, initpop = initpop, method = "NaiveBayes", metric = "Accuracy", nCV = 5, rate = 1, cluster_size_max = 50, cluster_size_min = 30, two_step_sampling_type = c("Subsampling_by_cluster", "Simulation"), simulation_model = "ZINB", sample_new_levels = "old_levels", use_average_cluster_profiles = FALSE, save.intermediate = FALSE, full_count_table = as.data.frame(t(sc_count)), cell_cluster_conversion = sc_cluster, relative_prop = relative_prop, simulation_parameter = simulation_params, gene2include.id = gene2include.id, gene.weight = gene.weight, weight_penalty = weight_penalty)

> sessionInfo() R version 4.2.2 Patched (2022-11-10 r83330) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.2 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] cowplot_1.1.1 ggdendro_0.1.23 reshape2_1.4.4
[4] pheatmap_1.0.12 boot_1.3-28.1 splitTools_0.3.2
[7] naivebayes_0.9.7 pROC_1.18.0 caret_6.0-93
[10] lattice_0.20-45 ranger_0.14.1 pagoda2_1.0.10
[13] igraph_1.4.1 data.table_1.14.8 gpsFISH_0.1.0
[16] doRNG_1.8.6 rngtools_1.5.2 foreach_1.5.2
[19] RcisTarget_1.18.2 lubridate_1.9.2 forcats_1.0.0
[22] stringr_1.5.0 dplyr_1.1.0 purrr_1.0.1
[25] readr_2.1.4 tidyr_1.3.0 tibble_3.1.8
[28] ggplot2_3.4.1 tidyverse_2.0.0 SeuratObject_4.1.3
[31] Seurat_4.3.0 trqwe_0.1 SingleCellExperiment_1.20.0 [34] SummarizedExperiment_1.28.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
[37] IRanges_2.32.0 S4Vectors_0.36.2 MatrixGenerics_1.10.0
[40] matrixStats_0.63.0 SCENIC_1.3.1 Biobase_2.58.0
[43] BiocGenerics_0.44.0 Matrix_1.5-3

YidaZhang0628 commented 1 year ago

Hi,

Thank you for reaching out to us. The error is because your sc_count object is in dense matrix format (dgeMatrix) instead of a regular matrix. As a result, the matrices in your relative_prop object are also in dense matrix format. The matrix in relative_prop will be used as part of the input for function inv.logit but inv.logit cannot take data with dense matrix format. To solve the problem, you can simply convert your sc_count object to a matrix using sc_count = as.matrix(sc_count) or convert the matrices in relative_prop to a matrix using relative_prop$cell.level = as.matrix(relative_prop$cell.level) and relative_prop$cluster.average = as.matrix(relative_prop$cluster.average)

In addition, I noticed from the files you sent me that the diagonal values of your weight_penalty matrix are all 0. They should be all 1. Otherwise, correct classifications will be always 0 and you will always have accuracy = 0.

Hope this solves the problem. Don't hesitate to let me know if you have any further questions.