Closed JPFinnigan closed 8 years ago
Thanks for the suggestions @JPFinnigan . I think I can address the first two points by adding a new command line tool to directly annotate a collection of variants with read counts. I actually need this for some work I'm doing now too. I'll take a stab at it tomorrow.
The third point is a bit tougher as varcode
automatically drops the shared prefix of the ref and alt alleles. The 'GT>G' style for deletions is also confusing for read counting since it's not clear how to treat a read that has a deletion at that position (120036407) but with a different base preceding it (e.g. an A at position 120036406 instead of a G). I'll look into adding an option for this though if it's not too complicated.
This needs some more testing and documentation, but the varlens-variants
tool now supports counting reads that match the ref / alt / total. Here's an example:
$ varlens-variants \
--include-read-evidence \
--variants test/data/CELSR1/vcfs/vcf_1.vcf \
--reads test/data/CELSR1/bams/bam_0.bam \
--variant-genome b37 \
--out /tmp/results.csv
This gives csv results like this:
genome,contig,interbase_start,interbase_end,ref,alt,num_alt,num_ref,total_depth
GRCh37,22,21829554,21829555,T,G,0.0,0.0,0.0
GRCh37,22,46931059,46931060,A,C,0.0,50.0,50.0
GRCh37,22,46931061,46931062,G,A,0.0,51.0,51.0
GRCh37,22,50636217,50636218,A,C,0.0,0.0,0.0
GRCh37,22,50875932,50875933,A,C,0.0,0.0,0.0
Other features:
--reads
and --variants
to varlens-variants
. The result csv will have read counts for all variant / bam combinations.donor
)varlens-variants
can add a column that gives the predicted affinity of the tightest-binding neo-epitope generated by each variantSee varlens-variants -h
if you're interested in any of these.
Interested to hear how this works for you if you end up using it.
@JPFinnigan did this address your issue? Let me know if there's anything else you need
Hey Tim,
I'll talk a look ASAP, and then get back to you. Thanks for looking into this!
Hey Tim,
Finally got around to using this. Ran varlens-variants w/ the following input:
JPF-MBP:varlens johnfinnigan$ varlens-variants \
> --include-read-evidence \
> --reads ~/Desktop/TEMP/Results/RNA/Tumor_B16.F10/ISMMS/Tumor_B16.F10_0810.127A/BAM/HISAT2/Tumor_B16.F10_0810.127A.HISAT2.sorted.bam \
> --reads ~/Desktop/TEMP/Results/RNA/Tumor_B16.F10/ISMMS/Tumor_B16.F10_0810.131B/BAM/HISAT2/Tumor_B16.F10_0810.131B.HISAT2.sorted.bam \
> --variants ~/Desktop/TEMP/Results/WES/Tumor_B16.F10_0810/ISMMS/VCF/MuTect/Tumor_B16_F10_0810.mutect.targets.pass.vcf \
> --variants ~/Desktop/TEMP/Results/WES/Tumor_B16.F10_0810/ISMMS/VCF/Strelka/results/passed.somatic.snvs.vcf \
> --out ~/Desktop/Tumor.B16.F10.0810.mutect.targets.pass.vcf_Tumor.B16.F10.0810.strelka.passed.snvs.vcf_Tumor.B16.F10.0810.127A.HISAT2.sorted.bam_Tumor.B16.F10.0810.131B.HISAT2.sorted.bam_varlens.csv
Received the following error message:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/varlens-variants", line 9, in <module>
load_entry_point('varlens', 'console_scripts', 'varlens-variants')()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 547, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2720, in load_entry_point
return ep.load()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2380, in load
return self.resolve()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2386, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/Users/johnfinnigan/Desktop/Utilities/Varlens/varlens/varlens/commands/variants.py", line 22, in <module>
from .. import variant_includes
File "/Users/johnfinnigan/Desktop/Utilities/Varlens/varlens/varlens/variant_includes.py", line 12, in <module>
from . import mhc_binding
File "/Users/johnfinnigan/Desktop/Utilities/Varlens/varlens/varlens/mhc_binding.py", line 19, in <module>
import topiary
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/__init__.py", line 7, in <module>
from .predict_epitopes import (
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/predict_epitopes.py", line 20, in <module>
from .commandline_args import (
File "/Users/johnfinnigan/Desktop/Utilities/Topiary/topiary/topiary/commandline_args.py", line 185, in <module>
"netmhc": mhctools.NetMHC,
AttributeError: 'module' object has no attribute 'NetMHC'
I'm thinking this means that varlens is looking for, but not finding NetMHC. Presumably, this is because mhctools was just updated to go from NetMHC
to NetMHC3
NetMHC4
. Also, this brings up a f/u question--are you sure you want to include MHC binding predictions by default in varlens-variants
?
Thanks for the bug report @JPFinnigan . As it looks like you determined, this was due to a topiary issue https://github.com/hammerlab/mhctools/pull/48#issuecomment-186895428
It's a reasonable point though that maybe we shouldn't try to import topiary (and thus depend on mhctools) unless the user is specifically asking for variant predictions. Reopening this issue to potentially address that
Total Counts. If would be great if output of --varlens-allele-support included a column containing the total count for all alleles at a given locus.
REF/ALT It would also be extremely helpful if REF/ALT status were carried forward from the .vcf. Example:
Deletions When working with .vcf containing deletions, varlens fills in the 'allele' column with a blank. Perhaps it would be better if the results were more consistent with the .vcf reporting of deletions. For example, varlens currently reports as follows:
Whereas the .vcf output for this locus looks like this:
CHROM | POS | ID | REF | ALT |
------------ | ------------- | ------------ | ------------- | ------------ | chr1|120036406|.|GT|G|
The varlens reporting convention for deletions makes it a little difficult to go back to the .vcf and search for the allele of interest, since the ALT allele in varlens "_" differs from the alt allele in the .vcf "G". I know that this is an issue with legacy base- versus the preferred space-counting coordinate systems, but perhaps we can come up with an interim solution in lieu of wider adoption of space-counting coordinate systems in .vcf?