Open iskandr opened 8 years ago
Another option: --context 50
will add a surrounding context of the 50 nucleotides around a variant (can be useful for filtering homopolymer regions).
Does https://github.com/hammerlab/varlens/issues/12 block this?
Is there a released version with some installation/usage documentation?
Good call, I can look into this. There is not currently a released and documented version but I should make one. (Currently have to run pip install .
from the checkout, and docs are just what you get from running with -h
.) I should be able to do this next week, but if we end up actively blocked on this please let me know.
Varlens has been revamped and documented and should hopefully be more usable now. I haven't done a pip release but will do that soon. See the README here for basic examples https://github.com/hammerlab/varlens. Each tool should also have reasonable help now.
Here's an example command that does what is asked for in this ticket:
$ varlens-variants \
test/data/CELSR1/vcfs/vcf_1.vcf \
test/data/CELSR1/vcfs/vcf_2.vcf \
--reads \
test/data/CELSR1/bams/bam_1.bam \
test/data/CELSR1/bams/bam_2.bam \
test/data/CELSR1/bams/bam_3.bam \
--include-read-evidence \
--include-gene \
--include-effect \
--include-context \
--reference ~/sinai/data/human_g1k_v37_reformatted.fasta
Output:
genome,contig,interbase_start,interbase_end,ref,alt,sources,effect,gene,context_5_prime,context_3_prime,context_mutation,1.bam_count_num_alt,1.bam_count_num_ref,1.bam_count_total_depth,2.bam_count_num_alt,2.bam_count_num_ref,2.bam_count_total_depth,3.bam_count_num_alt,3.bam_count_num_ref,3.bam_count_total_depth
GRCh37,22,21829554,21829555,T,G,1.vcf,non-coding-transcript,PI4KAP2,CCGTGTCCAACATGA,AGTGACCAGGGAGAC,T>G,0,0,0,0,0,0,0,0,0
GRCh37,22,46931059,46931060,A,C,1.vcf,p.S670A,CELSR1,CCCCCCATGAGCTCC,CCACCAGCGTGTCCA,T>G,0,222,329,0,93,93,0,279,323
GRCh37,22,46931061,46931062,G,A,1.vcf 2.vcf,p.S669F,CELSR1,CGCCCCCCATGAGCT,CTCCACCAGCGTGTC,C>T,0,330,330,2,91,93,1,321,324
GRCh37,22,50636217,50636218,A,C,1.vcf,intronic,TRABD,GCAGCCCCGCAGGGA,GGGCAACGGGCTGGG,T>G,0,0,0,0,0,0,0,0,0
GRCh37,22,50875932,50875933,A,C,1.vcf,splice-acceptor,PPP6R2,TAGTCAGAGAAGGCC,GGGAGGGAGGGAGGG,T>G,0,0,0,0,0,0,0,0,0
GRCh37,22,45309892,45309893,T,G,2.vcf,p.T214P,PHF21B,ATGGGGAGGGAGGGG,GAGGGGAAGAGAGGA,T>G,0,0,0,0,0,0,0,0,0
It's up on pypi now: https://pypi.python.org/pypi/varlens
Want to run this command from within Biokepi pipelines (to merge multiple VCFs):
This will merge the variants found in 4 VCFs and annotate each with its read evidence support from the 2 originating DNA alignments and an alignment of the RNAseq reads.