kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
163 stars 23 forks source link

Numbat is crashing with custom reference data #62

Closed Laolga closed 1 year ago

Laolga commented 1 year ago

Hi! I was using numbat before with provided reference data and it worked fairly well. Then I wanted to try with a more relevant reference data and used provided aggregate_counts function for that. It ran without any error and provided gene x cell types matrix without any NANs.

But numbat is crashing when I use that reference:

Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 20
min_cells = 50
init_k = 3
max_cost = 819
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
multi_allelic = TRUE
min_LLR = 5
min_overlap = 0.45
max_entropy = 0.5
skip_nj = FALSE
diploid_chroms = 
ncores = 20
ncores_nni = 20
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = TRUE
genome = hg38
Input metrics:
2730 cells
Mem used: 1.93Gb
Approximating initial clusters using smoothed expression ..
Mem used: 1.93Gb
number of genes left: 1853
running hclust...
Registered S3 method overwritten by 'dendextend':
  method     from 
  rev.hclust vegan
Iteration 1
Mem used: 2.57Gb
Error in value[[3L]](cond) : `glue` failed in `formatter_glue` on:

  'try-error' chr "Error in optim(fn = function(w) { : L-BFGS-B needs finite values of 'fn'\n" 

Raw error message:

 Expecting '}' 

Please consider using another `log_formatter` or `skip_formatter` on strings with curly braces.`glue` failed in `formatter_glue` on:

  - attr(*, "condition")=List of 2 

Raw error message:

 Expecting '}' 

Please consider using another `log_formatter` or `skip_formatter` on strings with curly braces.`glue` failed in `formatter_glue` on:

   ..$ message: chr "L-BFGS-B needs finite values of 'fn'" 

Raw error message:

 Expecting '}' 

Please consider using another `log_formatter` or `skip_formatter` on strings with curly braces.`glue` failed in `formatter_glue` on:

   ..$ call   : language optim(fn = function(w) {     w = w/sum(w) ... 

Raw error message:

 Expecting '}' 

Please consider using another `log_formatter` or `skip_formatter` on strings with curly braces.`glue` failed in `formatter_glue` on:
In addition: Warning messages:
1: In log(lambdas_ref * 1e+06 + 1) : NaNs produced
2: In mclapply(groups, mc.cores = ncores, function(g) { :
  all scheduled cores encountered errors in user code

Could you please suggest what might be wrong?

Olga

teng-gao commented 1 year ago

Please share some code that you used to create the ref, and show some entries of the resulting ref matrix - I suspect the scaling is wrong.

Laolga commented 1 year ago
> count_mat_ref = read.csv("count_mat.csv", row.names = 1)
> head(count_mat_ref[seq(1,5)])
                     RP11.206L10.2 RP11.54O7.16 HES4 ISG15 AGRN
AAACCTGCACGTTGGC-1-0             0            0    0     0    0
AAACCTGGTGCACTTA-1-0             0            0    0     0    0
AAACCTGTCAATACCG-1-0             0            0    0     0    0
AAACCTGTCGCGTAGC-1-0             0            0    0     0    0
AAACGGGAGGTCATCT-1-0             0            0    0     0    0
AAACGGGCAAAGGAAG-1-0             0            0    0     0    0
> cell2type = read.csv("celltypes.csv")
> colnames(cell2type) = c("cell", "group")
> head(cell2type)
                  cell                 group
1 AAACCTGCACGTTGGC-1-0 Malignant Basal State
2 AAACCTGGTGCACTTA-1-0 Malignant Basal State
3 AAACCTGTCAATACCG-1-0 Malignant Basal State
4 AAACCTGTCGCGTAGC-1-0 Malignant Basal State
5 AAACGGGAGGTCATCT-1-0 Malignant Basal State
6 AAACGGGCAAAGGAAG-1-0 Malignant Basal State
> ref_internal = aggregate_counts(t(count_mat_ref), cell2type)
cell_dict
                        Astrocytes                      Granule Cells              Malignant Basal State 
                                36                                162                               3543 
                 Malignant Cycling  Malignant Granule-like Progenitor   Malignant Neuronal Development I 
                               555                                181                               1657 
 Malignant Neuronal Development II Malignant Neuronal Development III                          Microglia 
                               871                                171                                 50 
> head(ref_internal)
               Astrocytes Granule Cells Malignant Basal State Malignant Cycling
RP11.206L10.2 0.000000000  3.151704e-05          4.925321e-04      2.944128e-05
RP11.54O7.16  0.000000000  0.000000e+00          1.038808e-03      2.566988e-05
HES4          0.005400502  2.767025e-04          2.605442e-05      1.749445e-03
ISG15         0.000000000  1.859600e-03          6.491731e-04      5.720368e-04
AGRN          0.000000000  0.000000e+00          7.412861e-04      5.547613e-05
TNFRSF18      0.000000000  3.787758e-04          1.154173e-03      0.000000e+00
              Malignant Granule-like Progenitor Malignant Neuronal Development I
RP11.206L10.2                      1.607266e-06                     3.168859e-04
RP11.54O7.16                       0.000000e+00                     2.813663e-05
HES4                               6.216030e-03                     4.306779e-06
ISG15                              3.068418e-06                     1.141342e-03
AGRN                               9.146807e-05                     4.081686e-04
TNFRSF18                           0.000000e+00                     4.138260e-04
              Malignant Neuronal Development II Malignant Neuronal Development III    Microglia
RP11.206L10.2                      0.0001436701                       1.209148e-05 0.000000e+00
RP11.54O7.16                       0.0006976547                       3.036306e-04 0.000000e+00
HES4                               0.0006754232                       2.060926e-04 6.577098e-07
ISG15                              0.0009031457                       1.155408e-05 0.000000e+00
AGRN                               0.0019187173                       1.808349e-04 1.873048e-03
TNFRSF18                           0.0002515563                       1.545023e-05 0.000000e+00
teng-gao commented 1 year ago

Hi @Laolga ,

Thanks. First, you can probably try log(lambdas_ref * 1e+06 + 1) outside of the program to see where NaN is produced. Second, tumor cell types should not be included in diploid reference.