Closed jzook closed 6 years ago
Note to self: could add a "filter" attribute to each sample, then apply the filter in getreads.get_read_batch()
. This would need some sort of UI, possibly a per-sample attribute similar to the sequencer
and single_ended
options.
closed by 203fedeba2b2f9a9d116135fce233d2d5e53edc3
You can now use my_10x.bam,split_hap=true
to specify that a bam file should be split based on the haplotype, similar to how you specify that a library is single-ended and should use pacbio error rates.
This looks like it works well with 10x data, which is great, but when I tried it with haplotype-separated PacBio data, I get the error below. Is it ok to specify both "sequencer=pacbio,split_hap=true"?
(svviz20a2) [jzook@sh-18-28 ~]$ svviz2 --report-only --min-mapq 20 --downsample 500 --outdir /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20 --ref /oak/stanford/groups/msalit/shared/genomes/hg19/hg19.fasta -V /scratch/PI/msalit/jzook/AJTrio/SVv050/union_171212_refalt.2.2.2.clustered.simpleINFO.techcounts.2techor4callset_200.vcf.gz /oak/stanford/groups/msalit/shared/giab/hg19/HG002/HG002_PB_70x_RG_HP10XtrioRTG.bam,sequencer=pacbio,split_hap=true
ssw library not found
rpy2 could not be imported; dotplots will not be generated
2018-06-19 08:47:28 - svviz2.app.sample - INFO - Calculating read statistics for HG002_PB_70x_RG_HP10XtrioRTG.bam
0 Counter({'unpaired': 2501})
2018-06-19 08:47:32 - svviz2.io.readstatistics - INFO - counts +/-:0 -/+:0 +/+:0 -/-:0 unpaired:2501
2018-06-19 08:47:32 - svviz2.app.sample - INFO - TIME to get read statistics:4.0s
2018-06-19 08:47:32 - svviz2.app.sample - INFO - Using realignment presets for pacbio
2018-06-19 08:47:32 - svviz2.app.sample - INFO - Loaded pre-computed read statistics for HG002_PB_70x_RG_HP10XtrioRTG_hap2
2018-06-19 08:47:32 - svviz2.app.sample - INFO - Loaded pre-computed read statistics for HG002_PB_70x_RG_HP10XtrioRTG_hap-unknown
2018-06-19 08:47:32 - svviz2.app.main - INFO - Search distance: 1,000bp
2018-06-19 08:47:32 - svviz2.app.main - INFO - Search distance: 1,000bp
2018-06-19 08:47:32 - svviz2.app.main - INFO - Search distance: 1,000bp
2018-06-19 08:47:32 - svviz2.app.main - INFO - Align distance: 29,044bp
[W::vcf_parse] FILTER 'NOT2TECH' is not defined in the header
[W::vcf_parse] INFO 'ClusterIDs' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'NumClusterSVs' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'ExactMatchIDs' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'NumExactMatchSVs' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'ClusterMaxShiftDist' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'ClusterMaxSizeDiff' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'ClusterMaxEditDist' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'PBcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'Illcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'TenXcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CGcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'PBexactcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'Illexactcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'TenXexactcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'CGexactcalls' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'HG2count' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'HG3count' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'HG4count' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'NumTechs' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'NumTechsExact' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DistBack' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DistForward' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DistMin' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'DistMinlt1000' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'MultiTech' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'MultiTechExact' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'sizecat' is not defined in the header, assuming Type=String
2018-06-19 08:47:32 - svviz2.app.datahub - INFO - Working on variant 0: Deletion::14:80,884,184-80,884,208(25)
2018-06-19 08:47:32 - svviz2.app.datahub - INFO - Looking for existing realignments; will try to open a few bam files that may not yet exist...
[E::hts_open_format] Failed to open file /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20/svviz2-tempu_cgway4/HG3_Ill_150bpfermikitraw_18808.del_14_80884184.HG002_PB_70x_RG_HP10XtrioRTG_hap1.alt.sorted.bam
[E::hts_open_format] Failed to open file /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20/svviz2-tempu_cgway4/HG3_Ill_150bpfermikitraw_18808.del_14_80884184.HG002_PB_70x_RG_HP10XtrioRTG_hap2.alt.sorted.bam
[E::hts_open_format] Failed to open file /scratch/PI/msalit/jzook/AJTrio/SVv050/genosv20a1PBsplithapHP10XtrioRTGv0502techor4callsetMQ20/svviz2-tempu_cgway4/HG3_Ill_150bpfermikitraw_18808.del_14_80884184.HG002_PB_70x_RG_HP10XtrioRTG_hap-unknown.alt.sorted.bam
2018-06-19 08:47:32 - svviz2.app.datahub - INFO - Analyzing sample HG002_PB_70x_RG_HP10XtrioRTG_hap1
2018-06-19 08:47:32 - svviz2.io.getreads - INFO - Loading more reads...
Traceback (most recent call last):
File "/home/users/jzook/svviz20a2/bin/svviz2", line 11, in
load_entry_point('svviz2==2.0a3', 'console_scripts', 'svviz2')()
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/app/main.py", line 60, in main
run(datahub)
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/app/main.py", line 41, in run
datahub.genotype_cur_variant()
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/app/datahub.py", line 109, in genotype_cur_variant
for batch in getreads.get_read_batch(sample, self):
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/io/getreads.py", line 14, in get_read_batch
batch = list(_get_read_batch(sample, datahub))[0]
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/io/getreads.py", line 28, in _get_read_batch
yield read_filter(batch)
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/io/read_filters.py", line 6, in filter_haplotype
cur_read1 = read_pair.original_read_ends["1"]
File "/home/users/jzook/svviz20a2/lib/python3.6/site-packages/svviz2/remap/alignment.py", line 76, in getattr
return getattr(self._read, name)
AttributeError: 'pysam.libcalignedsegment.AlignedSegment' object has no attribute 'original_read_ends'
(svviz20a2) [jzook@sh-18-28 ~]$
On Fri, Jun 8, 2018 at 8:04 PM Noah Spies notifications@github.com wrote:
You can now use my_10x.bam,split_hap=true to specify that a bam file should be split based on the haplotype, similar to how you specify that a library is single-ended and should use pacbio error rates.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nspies/svviz2/issues/50#issuecomment-395922598, or mute the thread https://github.com/notifications/unsubscribe-auth/ACU6dti6eK6CJokcawDudd3_XeeIDDg9ks5t6xEggaJpZM4UHcxc .
Would it be easy for you to separate counts and visualizations of reads by haplotype for bam files that have the HP tag, like the typical 10x bam or the output of whatshap haplotag? I've been doing this manually by creating multiple bam files, but it would be significantly easier to input the full bam and separate svviz outputs for HP 1, HP 2, and no HP.
Thanks! Justin