Convert the pipeline to hail for .tsv processing steps?

macarthur-lab / clinvar

This repo provides tools to convert ClinVar data into a tab-delimited flat file, and also provides that resulting tab-delimited flat file.

Other

122 stars 55 forks source link

Convert the pipeline to hail for .tsv processing steps? #50

Open bw2 opened 6 years ago

bw2 commented 6 years ago

Next time the pipeline needs updates, we should probably convert all steps that follow .xml parsing to a Spark-based hail pipeline.

Currently, the steps that generate clinvar x gnomAD tables take hours to run, so I skipped them for the latest release - hail would be able to perform these joins much more efficiently.