omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

Issue when munging the stats #107

Closed mkoromina closed 2 years ago

mkoromina commented 2 years ago

Hi @omerwe

Just a brief enquiry regarding munging the data. I am running the munge polyfun script on a munge sumstats file which includes both N column (reflecting the effective sample size) and N cases and N controls columns. I tried omitting the -n flag so that the effective sample size is automatically inferred but the script exits with an error as in 'cannot both have an N column and N_cases/N_controls columns in the sumstats file'. Moreover, upon munging another sumstats file, I specificied a number in the --n flag but the job ran only when I omitted this flag as both N_cases and N_controls columns were found within the sumstats files and this was creating conflicts.

Do you know which would be the optimal approach to munge sumstats files of such formats?

Thanks in advance!

omerwe commented 2 years ago

Hi @mkoromina,

I generally prefer that the munge_sumstats script tried to avoid making decisions in the presence of ambiguity. If we have both N, N_cases, N_controls, it's not clear what's the relationship between these quantities, and I don't want it to assume too much.

Are there any circumstances where N_cases and N_controls is not enough?

mkoromina commented 2 years ago

Hi @omerwe,

Many thanks for this. From my understanding, N_cases and N_controls should be enough to proceed with munging the data, so do you suggest dropping the column N or omitting the --n flag and allowing the munge_sumstats script to infer N (effective sample size) from the N_cases and N_controls ?

Thank you very much, Maria

omerwe commented 2 years ago

@mkoromina yes, I think that's the best approach.

mkoromina commented 2 years ago

Thanks very much @omerwe !