mcanouil / eggla

Early Growth Genetics Longitudinal Analysis.
https://m.canouil.dev/eggla/
Other
2 stars 1 forks source link

fix: statistics from GWAS #97

Closed mcanouil closed 1 year ago

mcanouil commented 1 year ago

Fixes

Chores

Build


After all discussions, below is the current output from the GWAS.

image
mcanouil commented 1 year ago

I added --freq, --missing, and --hardy from PLINK2. This will now produce the following output:

image

Note that FDR, BONFERRONI, etc. columns are only there to allow individual cohorts to reuse "locally" the summary statistics. It's not necessarily for use in the meta-analysis.

@annihei Let me know, if the changes suit the group, see #97 for more details.

mcanouil commented 1 year ago

@annihei Is the final word to remove HWE test?

Does the other changes seem fine to you? If so, could you mark the different threads above using the "Resolve conversation" buttons?

annihei commented 1 year ago

@mcanouil Yes let's drop the HWE_P. I have been trying to double check the other changes, will comment on those.

I think we should also drop the --mach-r2-filter flag in all PLINK2 commands as this was not in analysis plan and we filter based on INFO in the meta-analysis team.

mcanouil commented 1 year ago

I think we should also drop the --mach-r2-filter flag in all PLINK2 commands as this was not in analysis plan and we filter based on INFO in the meta-analysis team.

Removed in aa18ffc323b45c6cb9c4fcc39ad681a5f82390aa

mcanouil commented 1 year ago

HWE commands/outputs removed in c7c09c69c431d33c8dc5304a94808431f8248a98

mcanouil commented 1 year ago

@annihei You should be able to install the version from this PR using the following command: remotes::install_github("mcanouil/eggla@fix/issue96").

Note that, the column names might not match exactly the analysis plan (I did not check that yet). Either, we update the analysis plan to match the R package (It would make more sense that way to me), or we rename the output columns within the R package.

annihei commented 1 year ago

OK, thanks. I'll try these.

mcanouil commented 1 year ago

For reference, the current GWAS results output

image

It seems OBS_CT and MISSING_CT does not count DS data but only hardcall genotypes

mcanouil commented 1 year ago

Now, the values checks out. Note that Call rate is based on hard call for now. Should it be computed on dosage? F_MISS_DOSAGE as described in https://www.cog-genomics.org/plink/2.0/formats#vmiss

mcanouil commented 1 year ago

To-do:

mcanouil commented 1 year ago
image
mcanouil commented 1 year ago

After all discussions, below is the current output from the GWAS.

image