kharchenkolab / numbat

Haplotype-aware CNV analysis from single-cell RNA-seq
https://kharchenkolab.github.io/numbat/
Other
164 stars 23 forks source link

Error in if (UPGMA_score > NJ_score) #68

Open teng-gao opened 1 year ago

teng-gao commented 1 year ago

Error caused by providing segs_loh (ER3 in ovarian visium dataset)

Building phylogeny ..
Mem used: 0.517Gb
Using 9 CNVs to construct phylogeny
Aggregate function missing, defaulting to 'length'
Error in if (UPGMA_score > NJ_score) { : 
  missing value where TRUE/FALSE needed
Calls: run_numbat
In addition: Warning messages:
2: In log(1 - P) : NaNs produced
3: In log(1 - P) : NaNs produced
Execution halted
teng-gao commented 1 year ago

Decided not to fix now because it changes results of NCI-N87. Fix is in branch segs_loh

anderswe commented 1 year ago

Hi Teng,

Sorry, still getting this error using the segs_loh branch.

Here's the full log below. Grateful for your work with numbat and for any advice you may have – thanks.

Found 9 regions with LOH/deletions.
# A tibble: 9 × 6
  CHROM seg   seg_start   seg_end snp_rate loh  
  <fct> <fct>     <int>     <int>    <dbl> <lgl>
1 1     1b    120150898 145992442     1.31 TRUE 
2 1     1d    148102046 149903320     8.98 TRUE 
3 2     2b    186694060 189031898     7.66 TRUE 
4 2     2d    200811910 201451740     9.81 TRUE 
5 9     9b     39072767  68705240    15.4  TRUE 
6 11    11b    59171430  61680391     8.67 TRUE 
7 14    14b    24299850  30622254    16.9  TRUE 
8 16    16b    67192155  68023284    18.2  TRUE 
9 19    19b    49446298  49528003     8.49 TRUE 
Running under parameters:
t = 1e-04
alpha = 1e-04
gamma = 20
min_cells = 50
init_k = 3
max_cost = 2460.5
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
multi_allelic = TRUE
min_LLR = 5
min_overlap = 0.45
max_entropy = 0.5
skip_nj = FALSE
diploid_chroms = 
ncores = 8
ncores_nni = 8
common_diploid = TRUE
tau = 0.5
check_convergence = FALSE
plot = TRUE
genome = hg38
Input metrics:
4921 cells
Mem used: 5.98Gb
Approximating initial clusters using smoothed expression ..
Mem used: 5.98Gb
number of genes left: 12701
running hclust...
Iteration 1
Mem used: 13.2Gb
Running HMMs on 5 cell groups..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Retesting CNVs..
Expression noise level: medium (0.72). 
Running HMMs on 3 cell groups..
Testing for multi-allelic CNVs ..
3 multi-allelic CNVs found: 19a,2a,9a
Evaluating CNV per cell ..
Mem used: 9.27Gb
Excluding clonal LOH regions .. 
All cells succeeded
Expanding allelic states..
Building phylogeny ..
Mem used: 9.58Gb
Using 23 CNVs to construct phylogeny
Aggregate function missing, defaulting to 'length'
Error in if (UPGMA_score > NJ_score) { : 
  missing value where TRUE/FALSE needed
Calls: run_numbat
In addition: Warning messages:
1: In log(1 - P) : NaNs produced
2: In log(1 - P) : NaNs produced
Execution halted
teng-gao commented 1 year ago

Hi @anderswe ,

Thanks, I will look into this.

Best, Teng

anderswe commented 1 year ago

Thanks, @teng-gao!

I'll give this a go right now.

MartinCastagne commented 1 year ago

Hi @teng-gao, Thank you for the wonderful tool that is Numbat. I have an issue on some datasets where this same error appears. Even though I set multi_allelic=FALSE in run_numbat. Here's the full log below and thanks again for your help.

Attaching SeuratObject Attaching sp Le chargement a nécessité le package : Matrix Numbat version: 1.3.0 Running under parameters: t = 1e-05 alpha = 1e-04 gamma = 20 min_cells = 50 init_k = 3 max_cost = 99.9 n_cut = 0 max_iter = 2 max_nni = 100 min_depth = 0 use_loh = auto segs_loh = None call_clonal_loh = TRUE segs_consensus_fix = None multi_allelic = FALSE min_LLR = 5 min_overlap = 0.45 max_entropy = 0.5 skip_nj = FALSE diploid_chroms = None ncores = 16 ncores_nni = 16 common_diploid = TRUE tau = 0.3 check_convergence = FALSE plot = TRUE genome = hg38 Input metrics: 333 cells Mem used: 3.92Gb Calling segments with clonal LOH Approximating initial clusters using smoothed expression .. Mem used: 3.93Gb number of genes left: 8524 running hclust... Iteration 1 Mem used: 4.22Gb Expression noise level (MSE): high (2). Consider using a custom expression reference profile. Running HMMs on 4 cell groups.. Retesting CNVs.. Retesting CNVs.. Retesting CNVs.. Retesting CNVs.. Running HMMs on 2 cell groups.. Evaluating CNV per cell .. Mem used: 4.17Gb Excluding clonal LOH regions .. All cells succeeded Building phylogeny .. Mem used: 4.18Gb Using 10 CNVs to construct phylogeny Aggregate function missing, defaulting to 'length' Erreur dans if (UPGMA_score > NJ_score) { : valeur manquante là où TRUE / FALSE est requis Appels : run_numbat De plus : Messages d'avis : 1: Dans log(1 - P) : Production de NaN 2: Dans log(1 - P) : Production de NaN Exécution arrêtée srun: error: cpu-node-56: task 0: Exited with exit code 1

teng-gao commented 1 year ago

Hi @MartinCastagne ,

Is it possible to share your input data? Feel free to do so via email tgaoteng@gmail.com.

teng-gao commented 1 year ago

Hi @anderswe @MartinCastagne

This problem should be fixed in the main branch now (v1.3.1). Let me know if you still have the same issue.

BardeChoco225 commented 1 year ago

Hi,

First, thank you for this package which seems really promising for CNV calling. I am trying to use WGS data that I have on my samples to provide CNV calls via _segs_consensusfix . My table looks like this : Capture d’écran 2023-06-14 à 13 56 04 When using Numbat, I get this error which translates into "Error in if (UPGMA_score > NJ_score) { : missing value where TRUE/FALSE is required" and I can't seem to find the explanation to it :

full error log

Maybe it is because my seg column looks like "1.1/1.2/1.3/etc." instead of "1.a/1.b/1.c/etc.", but I sometimes have more than 26 segments on the same chromosome. I was about to open a new issue, and then I found this feed so I'm posting here. With the same data and parameters except for the use of _segs_consensusfix, I have no issue so it might be linked, even though the other people here don't seem to use it. Thank you very much for your time.

PS: after testing with _multiallelic = FALSE, it does the same

teng-gao commented 1 year ago

The error usually means you have multiple segments with the same name. Have you checked if the segments are uniquely named?

BardeChoco225 commented 1 year ago

I thought I did, but it turns out that an undesired type conversion made some of them similar. Fixed it and now it seems that everything's working fine. Thank you for the suggestion !

whitneyt1 commented 6 months ago

Hello!

I am receiving this same error:

image

Should each seg be completely unique instead of having 1718 counts per segment?

image

but it seems like my columns of P are unique?

image

Thank you @teng-gao

teng-gao commented 6 months ago

@whitneyt1 Hmm. something weird is happening. Seems like the P matrix wasn't constructed correctly. would you able to send me your source data? Feel free to use my email tgaoteng@gmail.com