mskcc / vcf2maf

Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms
Other
371 stars 216 forks source link

Enable AF reporting for both gnomad/exac VCFs, whether GRCh37/GRCh38 #92

Open ckandoth opened 7 years ago

ckandoth commented 7 years ago

ExAC allele counts and frequencies are already in the VEP cache. But we use the ExAC_nonTCGA VCF instead, minus a few known somatic variants related to hematopoietic clonal expansion. Figure out how to prevent needing a custom VCF, and the ExAC plugin. This will also enable ExAC AFs for GRCh38 variants.

If that doesn't work, we need to construct another custom VCF for GRCh38 variants.

ckandoth commented 7 years ago

Decided to do the custom VCF route. Already obsoleted need for the ExAC plugin in the latest commit. The vcf2maf script will now parse the ExAC VCF for allele counts, and now takes a path to the ExAC VCF as an argument. We simply need to add instructions for the user to create a GRCh38 variant in the gist - https://gist.github.com/ckandoth/f265ea7c59a880e28b1e533a6e935697

leiendeckerlu commented 7 years ago

Hi @ckandoth , do you still intend to provide instructions for the GRCh38 variant of the ExAC VCF? That would be very useful and highly appreciated! Thank you.

ckandoth commented 7 years ago

@leiendeckerlu my first attempt at creating a GRCh38 ExAC VCF was using liftOver (using vcf2vcf --remap-chain data/GRCh37_to_GRCh38.chain). It fails to map many variants, especially indels. I have since learned new tricks to handle indels, but ultimately it will never be as good as a GRCh38 gnomAD/ExAC VCF generated from GRCh38-aligned BAMs at the MacArthur Lab. I was hoping they would do that in the next few months, but it doesn't seem to be on their radar. I'll leave this ticket open in case someone can contribute a pull request. Otherwise, it may be a few months till I can give this another shot.