Closed ckeown closed 9 years ago
I'm guessing fetch
is not the problem here, but snp.ALT
is None
.
By the way, if your VCF file is grouped by chromosome, you can also get records one chromosome at a time using itertools.groupby
instead of fetch
. This would be a bit more lightweight since fetch
only works on tabix-indexed files and requires pysam.
For example:
import itertools
import operator
import vcf
reader = vcf.reader(filename='bla.vcf')
for chrom, records in itertools.groupby(reader, operator.attrgetter('CHROM')):
print 'Chromosome:', chrom
for record in records:
print record
Thanks for the response, Martijn. I'm pretty sure it's not snp.ALT because the below code gives me the same error:
import vcf
vcf_reader = vcf.Reader(open('../SNPs/snps_spretus_chr19.recode.vcf.gz', 'r'))
snps_vcf = vcf_reader.fetch('19', 0)
for snp in snps_vcf:
print "Hello."
I will try the other method, but it's not ideal, because I can't guarantee the file is sorted. Thanks!
Could you paste the complete stack trace for the error?
Hi I am trying to read the cosmic mutations vcf file one chromosome at a time, and I am having the same issue as stated above, is there any solution to this? Below is my code:
cosmic_vcf = vcf.Reader(open('CosmicMutantExport.vcf.gz','r'))
for lines in cosmic_vcf.fetch(chrom):
print(records.CHROM,records.POS, records.INFO['GENE'],records.REF,records.ALT)
where chrom is an iterator over the chromosomes.
The first argument to fetch
should be a string, not an iterator.
Could you paste some minimal but complete code that fails? Perhaps as a gist with a small VCF file included?
What versions of Python/PyVCF/pysam are you using?
I am using python 2.7.6, pysam version 0.8.0 and pyvcf version 0.6.7
I tried the following pieces of code import pysam import vcf cosmic_vcf = vcf.Reader(open('CosmicMutantExport.vcf.gz','r')) for lines in cosmic_vcf.fetch('1'): print(records.CHROM,records.POS, records.INFO['GENE'],records.REF,records.ALT)
and got the following error
Traceback (most recent call last):
File "test", line 4, in
Here is a sample from my vcf file
1 69345 COSM911918 C A . . GENE=OR4F5;STRAND=+;CDS=c.255C>A;AA=p.I85I;CNT=1 1 69523 COSM426644 G T . . GENE=OR4F5;STRAND=+;CDS=c.433G>T;AA=p.G145C;CNT=1 1 69538 COSM75742 G A . . GENE=OR4F5;STRAND=+;CDS=c.448G>A;AA=p.V150M;CNT=1 1 69539 COSM1343690 T C . . GENE=OR4F5;STRAND=+;CDS=c.449T>C;AA=p.V150A;CNT=1 1 69540 COSM1560546 G T . . GENE=OR4F5;STRAND=+;CDS=c.450G>T;AA=p.V150V;CNT=1 1 69569 COSM1599955 T C . . GENE=OR4F5;STRAND=+;CDS=c.479T>C;AA=p.L160P;CNT=2 1 69591 COSM3419425 C T . . GENE=OR4F5;STRAND=+;CDS=c.501C>T;AA=p.V167V;CNT=1 1 861390 COSM460103 G C . . GENE=SAMD11;STRAND=+;CDS=c.69G>C;AA=p.P23P;CNT=1 1 865609 COSM336143 C T . . GENE=SAMD11;STRAND=+;CDS=c.147C>T;AA=p.P49P;CNT=1 1 865617 COSM3790304 C G . . GENE=SAMD11;STRAND=+;CDS=c.155C>G;AA=p.S52C;CNT=1
Ah indeed, omitting the start position is only possible with the current master branch, not with the released 0.6.7. (The commit is 4fba62c.)
You can try the current master branch with:
pip install -e git+https://github.com/jamescasbon/PyVCF.git#egg=PyVCF
As an alternative, you could use the code I suggested in an earlier comment.
I think this should also answer @ckeown's question (omitting the end position is possible in 0.6.7, but only returns one call at that exact position, or None
if there is none).
Thank you that resolved the issue
The nugget of code below works beautifully:
I want to change it, however, so that I can read the whole chromosome in without having to specify a range on the positions (because ultimately I want to run this for all chromosomes). I have tried
snps_vcf = vcf_reader.fetch('19', 0)
. However, I get an error:I have similarly tried leaving the starting position out as well, but that gives me this error:
Any suggestions? The documentation states that I should be able to run both of the commands above. I would include the version number, but I can't seem to find it because
vcf.__version__
doesn't exist. It was installed within the last week, however, using pip, so it's probably the most recent.Thanks in advance for your help! chris