hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
956 stars 238 forks source link

Failure to annotate variants with dbNSFP #317

Closed mzekavat closed 8 years ago

mzekavat commented 8 years ago

I'm running the following script:

/psych/genetics_data/working/cseed/bin/hail read -i ${input_vds} \ annotatevariants tsv file:///medpop/esp2/mzekavat/Estonia/UPDATED_TOOLS/dbNSFPv3.2/dbNSFP3.2a.ALLChr.bgz \ -r va.dbNSFP \ -t 'SIFT_pred: String, PROVEAN_pred: String, Polyphen2_HDIV_pred: String, Polyphen2_HVAR_pred: String, LRT_pred: String, MutationTaster_pred: String, MutationAssessor_pred: String, FATHMM_pred: String, MetaSVM_pred: String, MetaLR_pred: String, CADD_phred: Double, Eigen-raw: Double, Eigen-phred: Double, Eigen-raw_rankscore: Double' \ -v "#chr,pos(1-based),ref,alt" \ -m "." \ annotatevariants expr -c 'va.of8 = (if ("D" ~ va.dbNSFP.SIFT_pred) 1 else 0) + (if ("D" ~ va.dbNSFP.PROVEAN_pred) 1 else 0) + (if ("D" ~ va.dbNSFP.Polyphen2_HDIV_pred) 1 else 0) + (if ("D" ~ va.dbNSFP.Polyphen2_HVAR_pred) 1 else 0) + (if ("D" ~ va.dbNSFP.LRT_pred) 1 else 0) + (if ("H" ~ va.dbNSFP.MutationAssessor_pred || "M" ~ va.dbNSFP.MutationAssessor_pred) 1 else 0) + (if ("D" ~ va.dbNSFP.MutationTaster_pred) 1 else 0) + (if ("D" ~ va.dbNSFP.FATHMM_pred) 1 else 0)' \ exportvariants -c 'v.contig,v.start,v.ref,v.alt,va.of8,va.dbNSFP.MetaSVM_pred,va.dbNSFP.MetaLR_pred,va.dbNSFP.CADD_phred,va.dbNSFP.Eigen-raw,va.dbNSFP.Eigen-phred,va.dbNSFP.Eigen-raw_rankscore' -o /user/mzekavat/MiGen/dbNSFP.MiGen.tsv

and I'm getting an error here: /medpop/esp2/mzekavat/MiGen/Annotation/hail.log Would greatly appreciate thoughts on this as soon as possible!

tpoterba commented 8 years ago

Maryam, I ran this command in Unix:

gunzip -c  <file> | cut -f4 | sort | uniq -c
20709505 A
20934670 C
20968049 G
20693812 T
     25 alt

I think the problem is that the headers from all the files were included in the one file. I'm running another grep now to be sure.

I'll fix the error message though!

tpoterba commented 8 years ago

Problem was variants in the dbnsfp file with ref == alt