Closed mkoromina closed 2 years ago
Hi @mkoromina,
I generally prefer that the munge_sumstats script tried to avoid making decisions in the presence of ambiguity. If we have both N
, N_cases
, N_controls
, it's not clear what's the relationship between these quantities, and I don't want it to assume too much.
Are there any circumstances where N_cases
and N_controls
is not enough?
Hi @omerwe,
Many thanks for this. From my understanding, N_cases and N_controls should be enough to proceed with munging the data, so do you suggest dropping the column N or omitting the --n flag and allowing the munge_sumstats script to infer N (effective sample size) from the N_cases and N_controls ?
Thank you very much, Maria
@mkoromina yes, I think that's the best approach.
Thanks very much @omerwe !
Hi @omerwe
Just a brief enquiry regarding munging the data. I am running the munge polyfun script on a munge sumstats file which includes both N column (reflecting the effective sample size) and N cases and N controls columns. I tried omitting the -n flag so that the effective sample size is automatically inferred but the script exits with an error as in 'cannot both have an N column and N_cases/N_controls columns in the sumstats file'. Moreover, upon munging another sumstats file, I specificied a number in the --n flag but the job ran only when I omitted this flag as both N_cases and N_controls columns were found within the sumstats files and this was creating conflicts.
Do you know which would be the optimal approach to munge sumstats files of such formats?
Thanks in advance!