openvax / pyensembl

Python interface to access reference genome features (such as genes, transcripts, and exons) from Ensembl
Apache License 2.0
374 stars 65 forks source link

Load mouse VCF files #112

Closed iskandr closed 9 years ago

iskandr commented 9 years ago
In [2]: varcode.load_vcf("mouse_vcf_dbsnp_chr1_partial.vcf")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-2afb1e38979a> in <module>()
----> 1 varcode.load_vcf("mouse_vcf_dbsnp_chr1_partial.vcf")

/Users/iskander/code/varcode/varcode/vcf.py in load_vcf(path, genome, only_passing, ensembl_version, reference_name, reference_vcf_key, allow_extended_nucleotides, max_variants)
     99                 ensembl_version,
    100                 reference_name,
--> 101                 reference_vcf_key)
    102
    103         for record in handle.vcf_reader:

/Users/iskander/code/varcode/varcode/vcf.py in make_ensembl(vcf_reader, ensembl_version, reference_name, reference_vcf_key)
    483         else:
    484             reference_path = vcf_reader.metadata[reference_vcf_key]
--> 485             reference_name = infer_reference_name(reference_path)
    486         ensembl_version = ensembl_release_number_for_reference_name(
    487             reference_name)

/Users/iskander/code/varcode/varcode/reference_name.py in infer_reference_name(path)
     33
     34     raise ValueError(
---> 35         "Failed to infer human genome assembly name for %s" % path)
     36
     37 def ensembl_release_number_for_reference_name(name):

ValueError: Failed to infer human genome assembly name for GCF_000001635.22

Seems like this will require us downloading some correspondence table between assembly names (e.g. http://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi?REQUEST=SHOW_STATISTICS)

iskandr commented 9 years ago

This now works (via some hackery in Varcode)