saigegit / SAIGE

Development for SAIGE and SAIGE-GENE(+)
GNU General Public License v3.0
64 stars 27 forks source link

INFO scores #84

Open MatthewMaher opened 1 year ago

MatthewMaher commented 1 year ago

I see from step2's "--help" (but not the online help, FWIW) that SAIGE has a filtering option "--minInfo" ("Minimum Info for markers") which seems to also require "--is_imputed_data=TRUE" and which goes along with the output column "imputationInfo".

Can you please clarify the source/calculation-method of the value that is actually getting tested? I believe it depends on the input type?

For BGEN input, I suspect you are calculating it - but what formula/calculation are you using? is it 'INFO' (as defined by IMPUTE2)? or perhaps R2 as defined by Mach? or R2 as defined by Minimac3/4? You call it 'info' so I think it's the former.

For VCF input, however, I think I see your code looking for the R2 element in the vcf's INFO column (no relation to the 'INFO' statistic). Is that correct? if so, then I believe the value (which obviously depends on who created the VCF) will often be the Minimac3/4 R2 method, since that's the imputation tool in heavy use via the Michigan and TopMed imputation servers.

thanks for any info and thanks for SAIGE*