Separating counts by haplotype

jzook commented 6 years ago

Would it be easy for you to separate counts and visualizations of reads by haplotype for bam files that have the HP tag, like the typical 10x bam or the output of whatshap haplotag? I've been doing this manually by creating multiple bam files, but it would be significantly easier to input the full bam and separate svviz outputs for HP 1, HP 2, and no HP.

Thanks! Justin

nspies commented 6 years ago

Note to self: could add a "filter" attribute to each sample, then apply the filter in getreads.get_read_batch(). This would need some sort of UI, possibly a per-sample attribute similar to the sequencer and single_ended options.

nspies commented 6 years ago

closed by 203fedeba2b2f9a9d116135fce233d2d5e53edc3

nspies commented 6 years ago

You can now use my_10x.bam,split_hap=true to specify that a bam file should be split based on the haplotype, similar to how you specify that a library is single-ended and should use pacbio error rates.

jzook commented 6 years ago

This looks like it works well with 10x data, which is great, but when I tried it with haplotype-separated PacBio data, I get the error below. Is it ok to specify both "sequencer=pacbio,split_hap=true"?

(svviz20a2) [jzook@sh-18-28 ~]$ svviz2 --report-only --min-mapq 20 --downsample 500 --outdir /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20 --ref /oak/stanford/groups/msalit/shared/genomes/hg19/hg19.fasta -V /scratch/PI/msalit/jzook/AJTrio/SVv050/union_171212_refalt.2.2.2.clustered.simpleINFO.techcounts.2techor4callset_200.vcf.gz /oak/stanford/groups/msalit/shared/giab/hg19/HG002/HG002_PB_70x_RG_HP10XtrioRTG.bam,sequencer=pacbio,split_hap=true

ssw library not found

rpy2 could not be imported; dotplots will not be generated

2018-06-19 08:47:28 - svviz2.app.sample - INFO - Calculating read statistics for HG002_PB_70x_RG_HP10XtrioRTG.bam

0 Counter({'unpaired': 2501})

2018-06-19 08:47:32 - svviz2.io.readstatistics - INFO - counts +/-:0 -/+:0 +/+:0 -/-:0 unpaired:2501

2018-06-19 08:47:32 - svviz2.app.sample - INFO - TIME to get read statistics:4.0s

2018-06-19 08:47:32 - svviz2.app.sample - INFO - Using realignment presets for pacbio

2018-06-19 08:47:32 - svviz2.app.sample - INFO - Loaded pre-computed read statistics for HG002_PB_70x_RG_HP10XtrioRTG_hap2

2018-06-19 08:47:32 - svviz2.app.sample - INFO - Loaded pre-computed read statistics for HG002_PB_70x_RG_HP10XtrioRTG_hap-unknown

2018-06-19 08:47:32 - svviz2.app.main - INFO - Search distance: 1,000bp

2018-06-19 08:47:32 - svviz2.app.main - INFO - Align distance: 29,044bp

[W::vcf_parse] FILTER 'NOT2TECH' is not defined in the header

[W::vcf_parse] INFO 'ClusterIDs' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'NumClusterSVs' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'ExactMatchIDs' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'NumExactMatchSVs' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'ClusterMaxShiftDist' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'ClusterMaxSizeDiff' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'ClusterMaxEditDist' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'PBcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'Illcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'TenXcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'CGcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'PBexactcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'Illexactcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'TenXexactcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'CGexactcalls' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'HG2count' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'HG3count' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'HG4count' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'NumTechs' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'NumTechsExact' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'DistBack' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'DistForward' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'DistMin' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'DistMinlt1000' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'MultiTech' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'MultiTechExact' is not defined in the header, assuming Type=String

[W::vcf_parse] INFO 'sizecat' is not defined in the header, assuming Type=String

2018-06-19 08:47:32 - svviz2.app.datahub - INFO - Working on variant 0: Deletion::14:80,884,184-80,884,208(25)

2018-06-19 08:47:32 - svviz2.app.datahub - INFO - Looking for existing realignments; will try to open a few bam files that may not yet exist...

[E::hts_open_format] Failed to open file /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20/svviz2-tempu_cgway4/HG3_Ill_150bpfermikitraw_18808.del_14_80884184.HG002_PB_70x_RG_HP10XtrioRTG_hap1.alt.sorted.bam

[E::hts_open_format] Failed to open file /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20/svviz2-tempu_cgway4/HG3_Ill_150bpfermikitraw_18808.del_14_80884184.HG002_PB_70x_RG_HP10XtrioRTG_hap2.alt.sorted.bam

[E::hts_open_format] Failed to open file /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20/svviz2-tempu_cgway4/HG3_Ill_150bpfermikitraw_18808.del_14_80884184.HG002_PB_70x_RG_HP10XtrioRTG_hap-unknown.alt.sorted.bam

2018-06-19 08:47:32 - svviz2.app.datahub - INFO - Analyzing sample HG002_PB_70x_RG_HP10XtrioRTG_hap1

2018-06-19 08:47:32 - svviz2.io.getreads - INFO - Loading more reads...

Traceback (most recent call last):

File "/home/users/jzook/svviz20a2/bin/svviz2", line 11, in

load_entry_point('svviz2==2.0a3', 'console_scripts', 'svviz2')()

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/app/main.py", line 60, in main

run(datahub)

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/app/main.py", line 41, in run

datahub.genotype_cur_variant()

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/app/datahub.py", line 109, in genotype_cur_variant

for batch in getreads.get_read_batch(sample, self):

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/io/getreads.py", line 14, in get_read_batch

batch = list(_get_read_batch(sample, datahub))[0]

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/io/getreads.py", line 28, in _get_read_batch

yield read_filter(batch)

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/io/read_filters.py", line 6, in filter_haplotype

cur_read1 = read_pair.original_read_ends["1"]

File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/remap/alignment.py", line 76, in getattr

return getattr(self._read, name)

AttributeError: 'pysam.libcalignedsegment.AlignedSegment' object has no attribute 'original_read_ends'

(svviz20a2) [jzook@sh-18-28 ~]$

On Fri, Jun 8, 2018 at 8:04 PM Noah Spies notifications@github.com wrote:

You can now use my_10x.bam,split_hap=true to specify that a bam file should be split based on the haplotype, similar to how you specify that a library is single-ended and should use pacbio error rates.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nspies/svviz2/issues/50#issuecomment-395922598, or mute the thread https://github.com/notifications/unsubscribe-auth/ACU6dti6eK6CJokcawDudd3_XeeIDDg9ks5t6xEggaJpZM4UHcxc .

nspies / svviz2

Separating counts by haplotype #50