mdshw5 / pyfaidx

Efficient pythonic random access to fasta subsequences
https://pypi.python.org/pypi/pyfaidx
Other
459 stars 75 forks source link

Cannot use FastaVariant without genotype #152

Closed victorlin closed 5 years ago

victorlin commented 5 years ago

I am trying to generate a consensus with the ClinVar VCF file (FTP link: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar_20190715.vcf.gz). However, the file does not contain genotype information (no samples). This seems to trigger an IndexError while parsing in __init__.py (line 1111).

I'm fairly new to working with VCF files in general - please let me know if I'm overlooking anything.

mdshw5 commented 5 years ago

This might be the wrong tool for the task. Currently, FastaVariant only incorporates genotypes from individual samples in a VCF, and only SNPs and MNPs, not indels (see #84). If you're trying to add ClinVar alleles prior to alignment, I might suggest you use something like hisat2 to build a graph-based index. Otherwise, if you want to make a consensus FASTA from this VCF, I think FastaAlternateReferenceMaker will do the trick.

victorlin commented 5 years ago

Thanks for the pointers. I'll look into the other tools you've mentioned.