googlegenomics / gcp-variant-transforms

GCP Variant Transforms
Apache License 2.0
135 stars 55 forks source link

Store VCF headers along with Avro files #640

Open samanvp opened 4 years ago

samanvp commented 4 years ago

In v0.9.0 we started offering --keep_intermediate_avro_files. Avro files are a great candidate for long term storage of variant data for several reasons, mainly because they can be loaded into BigQuery fast and free.

The only missing data is information in VCF header. Storing them with AVRO files will enable us to recover original VCF files flawlessly.