madprime / cgivar2gvcf

Lossy conversion of Complete Genomics var file to VCF
MIT License
5 stars 3 forks source link

cgivar2gvcf

Conversion of Complete Genomics var file to gVCF

This conversion assumes that the genome file is build 37 and does not currently support other builds.

Installation

This is Python package in the Python Package Index (PyPI). You can install it with pip install cgivar2gvcf. This installation will also install the twobitreader package.

Command line usage

You can run this tool on the command line like this:

python -m cgivar2gvcf -h

The above command will display the program's options.

Notably, you need a copy of the UCSC 2bit reference genome to perform conversion. The command line tool expects you to provide a directory where this file exists (it should have the name hg19.2bit). If it's not present, the tool with download a copy into this directory.

An example command for a variant-only VCF file (not gVCF):

python -m cgivar2gvcf -d files/ -i var-GS00253-ASM.tsv.bz2 --var-only -o GS00253-vcf-from-var.vcf.bz2

Module usage

convert_to_file

Writes the new VCF file to the specified output destination.

convert_to_file(cgi_input, output_file, twobit_ref, twobit_name, var_only=False)

convert

Returns a generator object that yields lines of the VCF file.

convert(cgi_input, twobit_ref, twobit_name, var_only=False)

get_reference_genome_file

Convenience function for finding the UCSC reference genome in a specified directory, and downloading it if it's not present.

get_reference_genome_file(refseqdir, build)