hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
983 stars 246 forks source link

infoScore from variantqc #685

Closed seuchoi closed 8 years ago

seuchoi commented 8 years ago

I cannot pull out infoScore values from variantqc with my bgen file. Could you please check ? Thank you.

cseed commented 8 years ago

@jigold What's the status of info score? Thanks!

tpoterba commented 8 years ago

This has been added as an aggregator: annotatevariants expr -c 'va.infoMetrics = gs.infoScore()'

seuchoi commented 8 years ago

@tpoterba @cseed @jigold I tested this commend in dataflow, but this does not work. Script hail importbgen -s impv1.hail.sample chr21impv1.bgen \ variantqc \ annotatevariants expr -c 'va.infoMetrc = gs.infoScore()' \ exportvariants -c 'Chrom=v.contig,rsID = va.rsid,info=infoScore,Pos=v.start,Ref=v.ref,Alt=v.alt,MAF=va.qc.AF' -o file:///medpop/afib/schoi/projects/ukbb/Result/QC/variantQC.tsv

Log hail: info: running: importbgen -s impv1.hail.sample chr21impv1.bgen hail: info: Number of BGEN files parsed: 1 hail: info: Number of samples in BGEN files: 152249 hail: info: Number of variants across all BGEN files: 982854 [Stage 0:======================================================>(155 + 1) / 156]hail: info: Coerced almost-sorted dataset hail: info: running: variantqc hail: info: running: annotatevariants expr -c 'va.infoMetrc = gs.infoScore()' hail: fatal: annotatevariants expr: 'no matching signature for 'infoScore()' on 'Aggregable[Genotype]' <input>:1:va.infoMetrc = gs.infoScore()

Thank you

jigold commented 8 years ago

Can you try it again? @cseed updated the JAR file on data flow.

Also, your code is not correct -- should be this:

hail importbgen -s impv1.hail.sample chr21impv1.bgen \ variantqc \ annotatevariants expr -c 'va.infoMetrc = gs.infoScore()' \ exportvariants -c 'Chrom=v.contig,rsID = va.rsid,info= va.infoMetrc.score,Pos=v.start,Ref=v.ref,Alt=v.alt,MAF=va.qc.AF' -o file:///medpop/afib/schoi/projects/ukbb/Result/QC/variantQC.tsv
seuchoi commented 8 years ago

I think it works :) Thanks!!!

jigold commented 8 years ago

The "score" is the INFO score computed using the IMPUTE method. You'd have to read about the method to understand why negative values are being computed. See the documentation here for more details: https://hail.is/docs/devel/index.html#infoscore_doc

To export only the score, use an expression like this (assuming you have assigned the result of the infoScore() function to va.infoMetric):

exportvariants -c 'Variant = v, infoScore = va.infoMetric.score, nIncluded = va.infoMetric.nIncluded'