vsbuffalo / readphaser

experimental read phasing from HapCut
6 stars 5 forks source link

rp.py error on multi-block hapcut file #6

Open zwuckl opened 8 years ago

zwuckl commented 8 years ago

Hey Vince,

I'm using your tool in the course of haplotype phasing moss contigs. As a first proof of concept, I called 212 variants (mostly SNPs) with freebayes on a specific contig of ~6,5kb and tried to phase them with hapcut and readphaser. I have a bam file of reads mapped against the contig containing 825 alignments and the necessary output of hapcut containing 142 variants in 3 blocks. Unfortunately, when I run rp.py like this:

python ~/Software/readphaser/pr.py -u unphased.fq -p phased.fq hapcut.out mapping.bam

I get the following error message:

[phase_reads] opening alignment BAM file...     done.
[phase_reads] phasing 'NODE_7344_length_6467_cov_3.70633_ID_13259', block_id 0
[error] inconsistent totals for contig NODE_7344_length_6467_cov_3.70633_ID_13259# contig='NODE_7344_length_6467_cov_3.70633_ID_13259' block_id=0 indels=36 no_overlap=460 phased=325 unmapped=2 filtered=36 total=825
NODE_7344_length_6467_cov_3.70633_ID_13259      0       539     GTGATGGG:6;AAGATTGT:2
...
NODE_7344_length_6467_cov_3.70633_ID_13259      0       2863    C:6;T:12
[phase_reads] phasing 'NODE_7344_length_6467_cov_3.70633_ID_13259', block_id 1
Traceback (most recent call last):
  File "/home/vangessel/Software/readphaser/pr.py", line 364, in <module>
    output_main(args)
  File "/home/vangessel/Software/readphaser/pr.py", line 334, in output_main
    region=args.region)
  File "/home/vangessel/Software/readphaser/pr.py", line 297, in phase_reads
    group_reads_by_block(reads, block, block_id, callback, stats, inconsistent_counts_file)
  File "/home/vangessel/Software/readphaser/pr.py", line 211, in group_reads_by_block
    htype_pos = group_varpos_by_haplotype(readpair, haplotypes, allele_counts)
  File "/home/vangessel/Software/readphaser/pr.py", line 147, in group_varpos_by_haplotype
    read_var = read.query[interval[0]-read.pos:interval[1]-read.pos]
TypeError: 'NoneType' object has no attribute '__getitem__'

It stops right where the second of the three blocks starts and thus fills the phased.fq with reads for all variants included in the first block. As I am no python expert, I'm not sure how to track down this error. The reads in the given region and their alignments don't look suspicious.

When I split up the blocks into individual input files, the first block gets processed successfully. However, the second and third block do not.

Do you have any idea? If it's necessary, I can provide you with the input files. As they are only for one contig, they are rather small in size...

I would very much appreciate your help. Regards, Nico

vsbuffalo commented 8 years ago

Yes, could you provide a small sample file that the error occurs on?

zwuckl commented 8 years ago

Thanks for the quick reply. Here is my data: readphaser_error.zip