jts / nanopolish

Signal-level algorithms for MinION data
MIT License
568 stars 159 forks source link

'nanopolish variants' fails when --ploidy is greater than two #779

Open TimD1 opened 4 years ago

TimD1 commented 4 years ago

I've been trying to call somatic variants in a diploid organism, and so would like to set --ploidy 4 for nanopolish variants. Whenever I do this, however, I get the following error:

nanopolish: src/nanopolish_variant_db.cpp:32: Combinations::Combinations(size_t, size_t, CombinationOption): Assertion `k <= N' failed.
./somatic_variant_pipeline: line 41: 16721 Aborted                 (core dumped)

I am calling Nanopolish as follows:

nanopolish variants \                                                                                  
     --verbose \                                                                                         
     --ploidy 4 \                                                                                        
     --snps \                                                                                            
     --calculate-all-support \                                                                           
     --max-haplotypes 1000 \                                                                             
     --min-candidate-depth 10 \                                                                          
     --min-candidate-frequency 0.1 \                                                                     
     --window $region \                                                                                  
     --threads $(nproc) \                                                                                
     --reads $proj_dir/$ds/merged/all.fastq \                                                            
     --bam $proj_dir/$ds/aligned/to_ref_final.bam \                                                      
     --genome $align_reference 

Here is the resulting backtrace:

[#0] 0x7ffff6944428 → __GI_raise(sig=0x6)
[#1] 0x7ffff694602a → __GI_abort()
[#2] 0x7ffff693cbd7 → __assert_fail_base(fmt=<optimized out>, assertion=0x763a0f "k <= N", file=0x7639b0 "src/nanopolish_variant_db.cpp", line=0x20, function=0x763be0 <Combinations::Combinations(unsigned long, unsigned long, CombinationOption)::__PRETTY_FUNCTION__> "Combinations::Combinations(size_t, size_t, CombinationOption)")
[#3] 0x7ffff693cc82 → __GI___assert_fail(assertion=0x763a0f "k <= N", file=0x7639b0 "src/nanopolish_variant_db.cpp", line=0x20, function=0x763be0 <Combinations::Combinations(unsigned long, unsigned long, CombinationOption)::__PRETTY_FUNCTION__> "Combinations::Combinations(size_t, size_t, CombinationOption)")
[#4] 0x47202c → Combinations::Combinations(this=<optimized out>, N=<optimized out>, k=<optimized out>, option=<optimized out>)
[#5] 0x4d256c → simple_call(variant_group=@0x18a62d80, ploidy=0x3, genotype_all_input_variants=0x0)
[#6] 0x4a17be → call_haplotype_from_candidates(alignments=@0x7fffffffd870, candidate_variants=std::vector of length 12, capacity 16 = {omitted for brevity}, alignment_flags=0x3)
[#7] 0x4a6b1a → call_variants_for_region(contig="chr6", region_start=0x18d37e4, region_end=0x18d3a3c)
[#8] 0x4a79b4 → call_variants_main(argc=<optimized out>, argv=<optimized out>)
[#9] 0x408ffd → std::function<int (int, char**)>::operator()(int, char**) const(__args#1=0x7fffffffe770, __args#0=0x14, this=0xd5efb0)

Looking at the source code, it appears that the constructor for Combinations is called with k==ploidy and N==variant_group.get_num_combinations(). The assertion which fails is k <= N. Combinations are added if considered a good_haplotype, so does this mean there aren't enough candidate variants within my region of interest to determine possible phasings for a 4-ploid organism? I've tried everything from relaxing to restricting the requirements for candidate variants and significantly increasing --max-haplotypes to one million, but nothing seems to help.

jts commented 4 years ago

Hi @TimD1,

This is an interesting one and I'm going to have to think about it a bit. I have to admit that the assertion doesn't make sense to me right now but it has been quite awhile since I wrote that code. Even if there are only 2 haplotypes (=combinations) it should be possible to arrange them into sets of 4 (AAAA, AAAB, AABB, ...).

I'll note that somatic calling isn't supported very well, even if you get ploidy 4 working. Nanopolish doesn't have a continuous allele frequency model (for subclonal mutations/cellularity) so I wouldn't expect the results to be very good. @jopineda in my group is working on somatic calling in general but it will be some time before we have results.

Jared

TimD1 commented 4 years ago

Okay, thanks for responding so quickly! You mention that "somatic calling isn't supported very well" because setting the ploidy to 4 is kind of a hacky fix to allow Nanopolish to call somatic variants. However, Nanopolish was the only nanopore tool flexible enough to even allow me to attempt it. I'm unaware of any other nanopore tools which are designed specifically for this purpose, do you know any off the top of your head? I know that WhatsHap (used by medaka_variant), Longshot, and Clair all specifically target germline mutations in diploid organisms.

jts commented 4 years ago

As far as I know there isn't a dedicated tool to call somatic mutations from nanopore reads.