macarthur-lab / clinvar

This repo provides tools to convert ClinVar data into a tab-delimited flat file, and also provides that resulting tab-delimited flat file.
Other
122 stars 55 forks source link

Fix multi allele parsing #34

Closed XiaoleiZ closed 7 years ago

XiaoleiZ commented 7 years ago

Update the script to parse the latest ClinVar 2017 May XML release and the resulting output files. The variant_summary.txt file from ClinVar has missing values and are different from ClinVar webpage. Thus the old version txt file is used (201703). To run this pipeline with old version txt file:

python2.7 master.py --b37-genome /path/to/b37.fa --b38-genome /path/to/b38.fa -E /path/to/ExAC.r1.sites.vep.vcf.gz -S /path/to/variant_summary_2017-03.txt.gz -GG /path/to/gnomad.genomes.r2.0.1.sites.coding.autosomes_and_X.vcf.gz -GE /path/to/gnomad.exomes.r2.0.1.sites.vcf.gz

I also add a limitation part in README.md to explain the issue.

bw2 commented 7 years ago

Thank you for the fix!