whatshap / whatshap

Read-based phasing of genomic variants, also called haplotype assembly
https://whatshap.readthedocs.io
MIT License
331 stars 38 forks source link

whatshap polyphase: Assertion violated (Get != Set): XXX-XXXX -nan -nan #472

Open amvarani opened 1 year ago

amvarani commented 1 year ago

Hi there

I'm trying to phase a hexaploid plant genome showing a haploid length of ~1,6Gb A total of six sequel II SMRT cells were mapped into a haploid version of this genome generated with hifasm + purge_dups The assembled haploid version of the genome is good, showing N50 > 85Mb The BAM and VCF files were generated according to the commands below:

minimap2 -ax map-hifi haploid.fasta HiFi.fastq.gz -t 60 |samtools view -@ 60 -Sb -| samtools sort -@ 32 -o file.sorted.bam - freebayes-parallel <(fasta_generate_regions.py haploid.fasta.fai 100000) 60 -f haploid.fasta -p 6 --use-best-n-alleles 6 file.sorted.bam > file.vcf

Running whatshap polyphase I have got several "Assertion violated" messages like below:

whatshap polyphase -t 30 -p 6 --ignore-read-groups -o run001.vcf.gz file.vcf file.sorted.bam

This is WhatsHap (polyploid) 2.0 running under Python 3.10.9 ======== Working on chromosome 'ptg000002l_1' ---- Processing individual HiFi Number of variants skipped due to missing genotypes: 0 Number of remaining heterozygous variants: 584422 Found 89567 reads covering 584402 variants Kept 89501 reads that cover at least two variants each Detecting connected components with weak interconnect .. Split heterozygous variants into 176 blocks (and 7018 singleton blocks). Phasing block 1 of 176 with 7112 reads and 49820 variants. Phasing block 2 of 176 with 3996 reads and 28354 variants. Phasing block 3 of 176 with 3612 reads and 27439 variants. Phasing block 4 of 176 with 2972 reads and 27418 variants. Phasing block 5 of 176 with 4031 reads and 25009 variants. Phasing block 6 of 176 with 4479 reads and 20422 variants. Phasing block 7 of 176 with 4696 reads and 20168 variants. Phasing block 8 of 176 with 3512 reads and 19148 variants. Phasing block 9 of 176 with 2937 reads and 19113 variants. Phasing block 10 of 176 with 3019 reads and 18801 variants. Phasing block 11 of 176 with 3622 reads and 17198 variants. Phasing block 12 of 176 with 2891 reads and 15187 variants. Phasing block 13 of 176 with 2726 reads and 14042 variants. Phasing block 14 of 176 with 1788 reads and 13183 variants. Phasing block 15 of 176 with 1258 reads and 12977 variants. Phasing block 16 of 176 with 3187 reads and 12878 variants. Phasing block 17 of 176 with 2228 reads and 12796 variants. Phasing block 18 of 176 with 1220 reads and 12776 variants. Phasing block 19 of 176 with 1771 reads and 12168 variants. Phasing block 20 of 176 with 1887 reads and 12003 variants. Phasing block 21 of 176 with 1438 reads and 11591 variants. Phasing block 22 of 176 with 1266 reads and 10779 variants. Phasing block 23 of 176 with 1750 reads and 10727 variants. Assertion violated (Get != Set): 1612 1606 -nan -nan Assertion violated (Get != Set): 1613 1606 -nan -nan Assertion violated (Get != Set): 1613 1612 -nan -nan Assertion violated (Get != Set): 1614 1606 -nan -nan Assertion violated (Get != Set): 1614 1612 -nan -nan Assertion violated (Get != Set): 1614 1613 -nan -nan Assertion violated (Get != Set): 1616 1606 -nan -nan Assertion violated (Get != Set): 1616 1612 -nan -nan Assertion violated (Get != Set): 1616 1613 -nan -nan Assertion violated (Get != Set): 1616 1614 -nan -nan

many others "Assertion violated" messages, and them

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/user001/miniconda3/lib/python3.10/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/algorithm.py", line 265, in phase_single_block_mt result = phase_single_block( File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/algorithm.py", line 214, in phase_single_block res = solve_polyphase_instance(subm, subgeno, sub_param, timers, quiet=True) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/algorithm.py", line 80, in solve_polyphase_instance phase_single_block( File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/algorithm.py", line 187, in phase_single_block threads, haplotypes = run_threading( File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/threading.py", line 42, in run_threading affine_switch_cost = ceil(compute_readlength_snp_distance_ratio(allele_matrix) / 1.0) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/threading.py", line 69, in compute_readlength_snp_distance_ratio return length / len(allele_matrix) ZeroDivisionError: division by zero """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/user001/.local/bin/whatshap", line 8, in sys.exit(main()) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/main.py", line 64, in main module.main(args) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/cli/polyphase.py", line 549, in main run_polyphase(**vars(args)) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/cli/polyphase.py", line 247, in run_polyphase ) = phase_single_individual( File "/home/user001/.local/lib/python3.10/site-packages/whatshap/cli/polyphase.py", line 315, in phase_single_individual result = solve_polyphase_instance(allele_matrix, genotype_list, param, timers, prephasing) File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/algorithm.py", line 113, in solve_polyphase_instance blockwise_results = [res.get() for res in process_results] File "/home/user001/.local/lib/python3.10/site-packages/whatshap/polyphase/algorithm.py", line 113, in blockwise_results = [res.get() for res in process_results] File "/home/user001/miniconda3/lib/python3.10/multiprocessing/pool.py", line 774, in get raise self._value ZeroDivisionError: division by zero

schrins commented 1 year ago

Thanks for reporting this issue. The violated assertions indicate that the internal read scoring produces NaN values at some point, which is bad. All following errors are very likely to originate from this because of invalid computations. It is difficult to say where the NaN values exactly come from. I will have a look at the code to see, if I can spot potential caveats or corner cases, that I overlooked before. For systematic debugging, it would be necessary, though, to reproduce the bug myself (for which I would need small example data that produce the error) or you could clone the repository and compile the code, because then I could send you some modified source files with debug output, that would help me locate the error.

Cloning and compiling is relatively easy if you already use either conda or virtual environments: https://whatshap.readthedocs.io/en/latest/develop.html

senthilk88 commented 9 months ago

Hi, I'm facing similar issue on a tetraploid potato data .. the last few lines of the error are as follows

whatshap polyphase -o Eva_variants_phased.vcf --reference=/nfs7/ROOTS/Sathuvalli_Lab/senthil/EVA/haplotype/DM6_1.fa /nfs7/ROOTS/Sathuvalli_Lab/senthil/EVA/haplotype/gatk/bwa-vcf/Eva_variants.vcf.gz /nfs7/ROOTS/Sathuvalli_Lab/senthil/EVA/haplotype/eva_bwa_marked_duplicates.bam --ploidy 4 --ignore-read-groups

Assertion violated (Get != Set): 1761 229 -nan -nan Assertion violated (Get != Set): 1761 230 -nan -nan Assertion violated (Get != Set): 1761 231 -nan -nan Assertion violated (Get != Set): 1761 232 -nan -nan Assertion violated (Get != Set): 1761 233 -nan -nan Assertion violated (Get != Set): 1761 234 -nan -nan Assertion violated (Get != Set): 1761 235 -nan -nan Assertion violated (Get != Set): 1761 236 -nan -nan Assertion violated (Get != Set): 1761 237 -nan -nan Assertion violated (Get != Set): 1761 238 -nan -nan Assertion violated (Get != Set): 1761 239 -nan -nan Assertion violated (Get != Set): 1761 240 -nan -nan Assertion violated (Get != Set): 1761 241 -nan -nan Assertion violated (Get != Set): 1761 242 -nan -nan Assertion violated (Get != Set): 1761 243 -nan -nan Assertion violated (Get != Set): 1761 244 -nan -nan Assertion violated (Get != Set): 1761 245 -nan -nan Assertion violated (Get != Set): 1761 246 -nan -nan Assertion violated (Get != Set): 1761 247 -nan -nan Assertion violated (Get != Set): 1761 248 -nan -nan Assertion violated (Get != Set): 1761 249 -nan -nan Assertion violated (Get != Set): 1761 250 -nan -nan Assertion violated (Get != Set): 1761 251 -nan -nan Assertion violated (Get != Set): 1761 252 -nan -nan Assertion violated (Get != Set): 1761 253 -nan -nan Assertion violated (Get != Set): 1761 254 -nan -nan Assertion violated (Get != Set): 1761 255 -nan -nan Assertion violated (Get != Set): 1761 256 -nan -nan Assertion violated (Get != Set): 1761 257 -nan -nan Assertion violated (Get != Set): 1761 258 -nan -nan Assertion violated (Get != Set): 1761 259 -nan -nan Assertion violated (Get != Set): 1761 260 -nan -nan Assertion violated (Get != Set): 1761 261 -nan -nan Assertion violated (Get != Set): 1761 262 -nan -nan Assertion violated (Get != Set): 1761 263 -nan -nan Assertion violated (Get != Set): 1761 264 -nan -nan Assertion violated (Get != Set): 1761 265 -nan -nan Assertion violated (Get != Set): 1761 266 -nan -nan Assertion violated (Get != Set): 1761 267 -nan -nan Assertion violated (Get != Set): 1761 268 -nan -nan Assertion violated (Get != Set): 1761 269 -nan -nan Assertion violated (Get != Set): 1761 270 -nan -nan Assertion violated (Get != Set): 1761 271 -nan -nan client_loop: send disconnect: Broken pipe

I can send you the sample data that can reproduce the error or I can clone and compile the code and produce the error

NaphatCode commented 9 months ago

@senthilk88 You may try WhatsHap 2.2.dev9+g18b518d that @schrins added countermeasures and fallbacks to fix similar issue. Please note that this is not an official release and I must apologize if I am interrupting official support.

To access WhatsHap 2.2.dev9+g18b518d, these command lines provided by @schrins in other issue should work

git clone https://github.com/whatshap/whatshap.git
cd whatshap
git checkout issue-498
conda create -n whtest python=3.10
pip install -e .

Best regards, Niel.

senthilk88 commented 9 months ago

Hi NapHatCode, Thank you for helping me out. This worked out great.

DavCP commented 8 months ago

I'm sorry for duplicating this comment but the last Issue I commented on was already closed:

I´m having a similar issue with my data when using the polyphase option. Phasing works as expected when using the phase option but when trying with plyphase I´m getting the following output.

======== Working on chromosome 'T511-LG14_Aco014822_L411' ---- Processing individual T511_supercontig.fasta Number of variants skipped due to missing genotypes: 0 Number of remaining heterozygous variants: 8 Found 5077 reads covering 8 variants Kept 1254 reads that cover at least two variants each Detecting connected components with weak interconnect .. Split heterozygous variants into 2 blocks (and 0 singleton blocks). Processing block 1 of 2 with 81 reads and 5 variants. Processing block 2 of 2 with 1173 reads and 3 variants. Assertion violated (Get != Set): 2 1 -nan -nan Assertion violated (Get != Set): 3 1 -nan -nan Assertion violated (Get != Set): 3 2 -nan -nan Assertion violated (Get != Set): 4 1 -nan -nan No minimum in last threaded column! Stored local cluster id was higher than size of global id vector! 0 31 2 Problem occured at position 4 in row [0, 0] Stored local cluster id was higher than size of global id vector! 0 31 2 Problem occured at position 3 in row [0, 0] Stored local cluster id was higher than size of global id vector! 0 31 2 Problem occured at position 2 in row [4294967295, 4294967295] Stored local cluster id was higher than size of global id vector! 0 31 4 Problem occured at position 1 in row [0, 0] Stored local cluster id was higher than size of global id vector! 0 31 2 Problem occured at position 0 in row [0, 0] Traceback (most recent call last): File "/home/plancarte/.local/bin/whatshap", line 8, in sys.exit(main()) ^^^^^^ File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/main.py", line 64, in main module.main(args) File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/cli/polyphase.py", line 552, in main run_polyphase(**vars(args)) File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/cli/polyphase.py", line 253, in run_polyphase ) = phase_single_individual( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/cli/polyphase.py", line 318, in phase_single_individual result = solve_polyphase_instance(allele_matrix, genotype_list, param, timers, prephasing) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/polyphase/algorithm.py", line 84, in solve_polyphase_instance phase_single_block( File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/polyphase/algorithm.py", line 190, in phase_single_block threads, haplotypes = run_threading( ^^^^^^^^^^^^^^ File "/home/plancarte/.local/lib/python3.12/site-packages/whatshap/polyphase/threading.py", line 51, in run_threading assert len(paths) == num_vars AssertionError

In some loci I get the following:

======== Working on chromosome 'T511-LG07_Aco014501_L335' ---- Processing individual T511_supercontig.fasta Number of variants skipped due to missing genotypes: 0 Number of remaining heterozygous variants: 8 Found 1067 reads covering 8 variants Kept 148 reads that cover at least two variants each Detecting connected components with weak interconnect .. Split heterozygous variants into 2 blocks (and 0 singleton blocks). Processing block 1 of 2 with 59 reads and 3 variants. Processing block 2 of 2 with 89 reads and 4 variants. ======== Writing VCF Done writing VCF

But in most of them I get the Assertion violated one.

Btw, as suggested by schrins I installed the unreleased development version 2.2.dev10+g2a2077c.

I'll be really grateful if you guys can help me figure this out

schrins commented 8 months ago

Hello everyone,

I forgot about the open branch, it could have been merged back at least to the main branch quite some time ago. I will merge it right away and open a new one for the remaining issue here. Your local copy of issue-498 will still work, but you can as well switch to main, pull and recompile (with pip install -e .) as well.

Regarding the comment of @DavCP : Looks like the error happens exactly on the block with super high coverage (1173 reads for 3 variants). There might still be a numeric problem left. Would it be possible to share one of the problematic inputs for debugging? I might not even need a reference genome (in case you used one yourself) to reproduce this bug.

DavCP commented 8 months ago

@schrins It seems that the issue is solved after recompiling whatshap, it finally managed to run without errors. However, I'm getting an output similar to this:

======== Working on chromosome 'T511-LG17_Aco003492_L323_324' ---- Processing individual T511_supercontig.fasta Number of variants skipped due to missing genotypes: 0 Number of remaining heterozygous variants: 27 Found 5453 reads covering 27 variants Kept 4514 reads that cover at least two variants each Detecting connected components with weak interconnect .. Split heterozygous variants into 1 blocks (and 0 singleton blocks). Processing block 1 of 1 with 4514 reads and 27 variants. Warning: Found 2070899 NaN scores during read scoring!

polyphase runs normally, but I get these warnings in blocks with high coverage. Do you still need my data for testing and debuggin or do you have any idea of why these warnings are occurring and what they mean?

schrins commented 8 months ago

The warnings indicate, that some calculations are still invalid, but I added a safety clause to catch all these NaN values, instead of letting the program crash as before. If you can share a small test sample, it would be helpful to debug.

DavCP commented 8 months ago

The warnings indicate, that some calculations are still invalid, but I added a safety clause to catch all these NaN values, instead of letting the program crash as before. If you can share a small test sample, it would be helpful to debug.

@schrins , I already sent you a Drive folder link to your e-mail, I really appreciate your help and support.

schrins commented 8 months ago

Thanks for the data, this was very helpful. I am about to upload a new branch with the name issue-472, where I do not receive any warnings anymore for your BAM file. However, I did not run all the contigs, because there seems to be another issue: Running time. The phaser gets through the contigs eventually, but takes absurdly long on some of them. There is a certain part in the algorithm that scales quadratically with coverage. With "normal" long read coverages this is no issue at all, but your data goes as high as 7000X on some variants.

If might be possible to solve this, if the algorithm internally merged reads with identical allele pattern (same covered variants with exactly the same alleles). This would require quite some changes though, so I would rather open a new issue for this.

As a workaround, you could downsample the reads for the problematic contigs, if the running time is not tolerable for now. In theory, reducing the number of reads by factor 2 should improve the running time by about a factor of 4.

DavCP commented 8 months ago

Thanks for your feedback on my data and your suggestions! I was already planning on downsampling or capping my read coverage, so that's the way I'm going to do it.