kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
164 stars 23 forks source link

Error in `mutate()` while running numbat with snRNAseq data generated by Takara SMART-Seq Stranded Kit #122

Closed xiaoxinchen2022 closed 1 year ago

xiaoxinchen2022 commented 1 year ago

Hi, I am running numbat with mouse snRNAseq data generated by Takara SMART-Seq Stranded Kit. I did analysis on 52 nuclei and got the following error message. Can anyone help to check? The data that I used for analysis is also attached.

out = run_numbat( count_mat01, count_mat_ref, df_allele, t = 1e-5, ncores = 1, skip_nj = TRUE, min_LLR = 30, out_dir = './results', genome = "mm10", nu = 0 ) ..........................................................

out = run_numbat(

  • count_mat01,
  • count_mat_ref,
  • df_allele,
  • t = 1e-5,
  • ncores = 1,
  • skip_nj = TRUE,
  • min_LLR = 30,
  • out_dir = './results',
  • genome = "mm10",
  • nu = 0
  • ) Numbat version: 1.3.0 Running under parameters: t = 1e-05 alpha = 1e-04 gamma = 20 min_cells = 50 init_k = 3 max_cost = 15.6 n_cut = 0 max_iter = 2 max_nni = 100 min_depth = 0 use_loh = auto segs_loh = None call_clonal_loh = FALSE segs_consensus_fix = None multi_allelic = TRUE min_LLR = 30 min_overlap = 0.45 max_entropy = 0.5 skip_nj = TRUE diploid_chroms = None ncores = 1 ncores_nni = 1 common_diploid = TRUE tau = 0.3 check_convergence = FALSE plot = TRUE genome = mm10 Input metrics: 52 cells Mem used: 0.488Gb Approximating initial clusters using smoothed expression .. Mem used: 0.488Gb number of genes left: 5129 running hclust... Iteration 1 Mem used: 0.488Gb High SNP contamination detected (86.9%). Please make sure that cells from only one individual are included in genotyping step. Expression noise level (MSE): high (94). Consider using a custom expression reference profile. Running HMMs on 2 cell groups.. Error in mutate(): ℹ In argument: seg = paste0(CHROM, generate_postfix(cumsum(boundary) + 1)). ℹ In group 20: CHROM = NA. Caused by error in while (i > 0) { remainder <- (i - 1) %% 26 i <- (i - 1) %/% 26 postfix <- c(alphabet[remainder + 1], postfix) }: ! missing value where TRUE/FALSE needed Run rlang::last_trace() to see where the error occurred.

data files ............................................................................................................... count_mat01.txt df_allele.txt count_mat_ref.txt

teng-gao commented 1 year ago

Hi,

There are a few issues:

  1. The reference expression matrix should be normalized expression values (prepared using numbat::aggregate_counts) not raw counts.
  2. The allele profile looks all homozygous. Please make sure to only include heterozygous SNPs. image

Best, Teng

xiaoxinchen2022 commented 1 year ago

Hi Teng,

Thanks so much! Will you be able to share the rds with me so that I can explore the results? My email is xiaoxin.chen@ucsf.edu

Meanwhile, I will try to use numbat::aggregate_counts to prepare the normalized expression values.

The cells are all from BL6 mice and they are pure background. Is it possible to use numbat for analysis?

Would greatly appreciate your help.

Best,

Xiaoxin

teng-gao commented 1 year ago

Hi,

Unfortunately, allele-specific CNV analysis with Numbat requires the presence of heterozygous SNPs. With a mouse with pure genetic background, not enough heterozygous SNPs are available throughout the genome, so it is not amenable to Numbat analysis.

Best, Teng