kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
166 stars 23 forks source link

Error with SNP contamination for some samples #127

Open Sarah145 opened 1 year ago

Sarah145 commented 1 year ago

Hi, firstly, thank you for creating such a great tool!

I've been running numbat on samples from multiple individuals at three separate timepoints and for some individuals the tool runs smoothly and the results look nice. However, for some individuals I'm getting errors related to SNP contamination.

I ran the pileup_and_phase.R script for multiple samples from the same individual jointly and then ran run_numbat on each sample individually. Here's an example of the output I'm getting for one of the samples.

Loading required package: Matrix
Numbat version: 1.3.0
Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 15
min_cells = 10
init_k = 3
max_cost = 1378.2
n_cut = 0
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
segs_loh = None
call_clonal_loh = FALSE
segs_consensus_fix = None
multi_allelic = TRUE
min_LLR = 10
min_overlap = 0.45
max_entropy = 0.9
skip_nj = TRUE
diploid_chroms = None
ncores = 16
ncores_nni = 16
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = FALSE
genome = hg38
Input metrics:
4594 cells
Mem used: 1.17Gb
Approximating initial clusters using smoothed expression ..
Mem used: 1.17Gb
number of genes left: 10520
running hclust...
Iteration 1
Mem used: 1.56Gb
High SNP contamination detected (40.9%). Please make sure that cells from only one individual are included in genotyping step.
Expression noise level (MSE): low (0.16). 
Running HMMs on 5 cell groups..
Error in `recycle_columns()`:
! Tibble columns must have compatible sizes.
• Size 166561: Column `3`.
• Size 229810: Column `2`.
• Size 245035: Column `1`.
ℹ Only values of size one are recycled.
Backtrace:
     ▆
  1. ├─numbat::run_numbat(...)
  2. │ └─bulk_subtrees %>% ...
  3. ├─numbat:::run_group_hmms(...)
  4. │ └─numbat:::find_common_diploid(...)
  5. │   └─... %>% bind_rows()
  6. └─dplyr::bind_rows(.)
  7.   ├─tibble::as_tibble(dots)
  8.   └─tibble:::as_tibble.list(dots)
  9.     └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 10.       └─tibble:::recycle_columns(x, .rows, lengths)
 11.         └─tibble:::abort_incompatible_size(.rows, names(x), lengths, "Requested with `.rows` argument")
 12.           └─tibble:::tibble_abort(...)
 13.             └─rlang::abort(x, class, ..., call = call, parent = parent, use_cli_format = TRUE)
Warning message:
In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
  scheduled cores 2, 1 encountered errors in user code, all values of the jobs will be affected
Execution halted

I'm not sure if the error is being triggered by the high SNP contamination warning or something else but all cells are from the same individual so I'm not too sure why there's high SNP contamination.

Any insight you can provide would be greatly appreciated!

teng-gao commented 1 year ago

Hi Sarah,

The SNP contamination message indicates that a large fraction of the SNPs in the profile are homozygous. However, the analysis should still run so there may be an exception that is not handled properly. Feel free to share the input of one such sample via email (tgaoteng@gmail.com). https://github.com/kharchenkolab/numbat/blob/a367fa55fd3ec6b516c3131b955b61e8d767a722/R/diagnostics.R#L151-L169

Sarah145 commented 1 year ago

Hi Teng, thanks for getting back to me! I shared the input files with you via email :blush: