mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
449 stars 75 forks source link

FastaVariant example not working #129

Closed daquang closed 6 years ago

daquang commented 6 years ago

I'm trying to run the FastaVariant example in the README, but I'm getting the following error:

In [1]: from pyfaidx import FastaVariant

In [2]: consensus = FastaVariant('tests/data/chr22.fasta', 'tests/data/chr22.vcf
   ...: .gz', het=True, hom=True)
/home/daquang/Software/pyfaidx/pyfaidx/__init__.py:959: RuntimeWarning: Using sample NA06984 genotypes.
  warnings.warn("Using sample {0} genotypes.".format(self.sample), RuntimeWarning)

In [3]: consensus['22'].variant_sites
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/Software/pyfaidx/pyfaidx/__init__.py in __getitem__(self, rname)
    868         try:
--> 869             return self.records[rname]
    870         except KeyError:

KeyError: '22'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-3-0c3cd66e8fab> in <module>()
----> 1 consensus['22'].variant_sites

~/Software/pyfaidx/pyfaidx/__init__.py in __getitem__(self, rname)
    869             return self.records[rname]
    870         except KeyError:
--> 871             raise KeyError("{0} not in {1}.".format(rname, self.filename))
    872 
    873     def __repr__(self):

KeyError: '22 not in tests/data/chr22.fasta.'
mdshw5 commented 6 years ago

Did you first run tests/data/download_gene_fasta.py? If so could you please inspect the file to make sure it's not empty? I have a test for this, and it's passing currently so I suspect it's an issue with your chr22.fasta file.

https://github.com/mdshw5/pyfaidx/blob/7b4d8d7aceadaa1fde05846e854e6eccdba38b77/tests/test_FastaVariant.py#L21-L28

daquang commented 6 years ago

Yes, you are correct. The chr22.fasta file was empty. I manually downloaded the human_b36_male.fa from 1000 genomes and tried it again, and it worked. Perhaps chr22.fasta is not being made correctly because I'm using python 3.6 or I am missing some dependencies?

Also, perhaps you can make FastaVariant a little faster by using cyvcf2 instead of PyVCF?

mdshw5 commented 6 years ago

That’s a great idea. Maybe @brentp can make a suggestion about integration. I’ll look into it.

daquang commented 6 years ago

That would be great if we can we get this going! I have to grab lots of sequences efficiently with SNPs properly replacing single letters, and my VCF files may have millions of variants. Every small speedup helps.

daquang commented 6 years ago

Do you think you can also provide some documentation on what some of the arguments of FastaVariant (e.g. hom, het, call_filter) mean?

mdshw5 commented 6 years ago

Sorry for the delay here. I'm planning to re-write the documentation when I have time and will certainly add more clarification about the variant filters.