privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

Why do I get zero p-values? #476

Closed garyzhubc closed 4 months ago

garyzhubc commented 4 months ago

Zero p-values from:

gwas <- big_univLogReg(ukb_abca1$genotypes, 1-df[[var_name]], df$id, range)

and

snp_manhattan(gwas, ukb_x1$map$chromosome, ukb_x1$map$physical.pos)

In particular:

> min(predict(gwas, log10=FALSE), na.rm = TRUE)
[1] 0

So the plot looks like:

image

privefl commented 4 months ago

The p-value is so small that it cannot be stored as a number by computers. It has to be rounded to 0. This is why I usually work with log10(pval).

garyzhubc commented 4 months ago

Is it possible to get rid of that point (or make amendment) in the plot?

privefl commented 4 months ago

I would add + scale_y_log10(). But the -log10(pval) seems really large. If you want to remove it, you can probably do something like gwas$score[which.max(gwas$score^2)] <- 0.

garyzhubc commented 4 months ago

After adding + scale_y_log10() now I got a plot like this: image How to change the y-axis label to -log10(log10(p-value))?

Is this the right way to plot with the line of significance? The line unfortunately, didn't show up.

snp_manhattan(gwas, ukb_xxx$map$chromosome, ukb_xxx$map$physical.pos) + 
  scale_y_log10() + 
  geom_hline(yintercept = -log10(log10(5e-8)), linetype = 2, color = "red")
privefl commented 4 months ago

The y-values are -log10(pval), but on top you represent them on a log-scale (of y, so on a log-log-scale of pval), which is fine. Then the intercept is simply -log10(5e-8).

garyzhubc commented 4 months ago

Why isn't the intercept now in log-log-scale too?

privefl commented 4 months ago

What you plot is -log10(pval). You do not apply another log() to these, it is just that the values are now also spaced on a log scale (but unchanged).