omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

per-SNP sample size N #161

Closed piganoemi closed 1 year ago

piganoemi commented 1 year ago

Hi,

I have meta-analysis summary statistics that I would like to use with Polyfun+SuSiE implementation. I got stuck with the finemapper.py function. The N sample size differs among SNPs. I put the N column in the dataset that I use as argument for the --sumstats parameter. But looks like the finemapper.py function does not work without an integer number (therefore the same for all SNPs) for the --n parameter. So I can't work out how I can tell to the function that the N column is in the dataset and to use this as sample size.

Is there a way to use a different value for the parameter --n in the finemapper.py function ? And if so, how can I do it ?

Thanks in advance, Noemi

omerwe commented 1 year ago

Unfortunately we don't support a variable N column (neither SuSiE nor FINEMAP support this as far as I know)

If the numbers aren't too different, I would put the average N. If the numbers are pretty different then my guess is that this is meta-analyzed data, in which case the results may not be reliable anyway (please see Table 3 in the PolyFun paper and the associated text).

Sorry I can't offer more help, please let me know if I can help with anything else!

piganoemi commented 1 year ago

Hi Omer,

Thanks for your answer ! Yes, the data comes from a meta-analysis. I tried with the median number (which corresponds to the maximum N as well) and with the minimum number. I used the pre-computed LD matrix and priors, and set 10 as number of causal variants. The results for the two N are quite similar in terms of PIP. But now, I wonder how reliable they are, as you pointed out (much appreciated) and if I should use 1 as number for causal variants.

To add, you might know it already, but just in case. I found an R function, POLYFUN() (within the 'echofinemap' package): it looks like it is able to compute per-SNP sample sizes polyfun+SuSiE analysis ('compute_n' argument): [https://rdrr.io/github/RajLabMSSM/echofinemap/man/POLYFUN.html] Although, I haven't been able to use it yet since I have problem downloading the package. Have you ever tried it ? Or, do you think that using per-SNP sample sizes for polyfun+SuSiE analysis will give reliable results ?

omerwe commented 1 year ago

Yes, setting L=1 is the safest option (though it comes at the price of lower power 🤷 )

Yes, I know the EcholocatoR package, and spoke with its author (Brian Schilder) quite a lot, though I'm not sure what this specific function does. He'll probably respond if you open an issue in the GitHub page.

piganoemi commented 1 year ago

Ok, thanks again for your help and explanations !