stephenturner / qqman

An R package for creating Q-Q and manhattan plots from GWAS results
http://cran.r-project.org/web/packages/qqman/
GNU General Public License v3.0
156 stars 91 forks source link

highlight not working #13

Closed isp2 closed 10 years ago

isp2 commented 10 years ago

Hello,

I have gwas results from PLINK or in similar formats (with CHR, SNP, BP, and P columns). I've been using qqman to obtain qqplots and manhattan plots and I'm really pleased with this tool.

More recently I've decided to use qvalue R-package to determine the q-values corresponding to each p-value and add a "significant" column to the original gwas results table. Than I select all the SNPs that are significant at the level I chose as a cutoff (0.01 or 0.05 FDR) and generate a snps2highlight list using awk.

I've figured out that I need to make sure that the SNP column is read as a character and not a vector. And I'm using newtable <- rapply(originaltable, as.character, classes="factor", how="replace") to fix this.

However, every time I try to obtain a manhattan plot with the snps in the list highlighted I get the following error: Warning message: In manhattan(newplink3.qassoc, highlight = newsnps2highlight) : You're trying to highlight SNPs that don't exist in your results.

This is not true since I've made sure that the same SNP IDs are in both tables (gwas results and snps2highlight). Do you think this error as something to do with the factor to character transformation?

As an example here is a truncated list of snps to highlight: 1:23402516 1:23555692

And here is a small portion of the PLINK results table: CHR SNP BP NMISS BETA SE R2 T P 1 1:23187542 23187542 56 -0.00375 0.003287 0.02353 -1.141 0.259 1 1:23217117 23217117 56 -0.0004373 0.001809 0.001082 -0.2418 0.8098 1 1:23226619 23226619 56 0.0001129 0.001978 6.03e-05 0.05706 0.9547 1 1:23280906 23280906 56 0.003209 0.001849 0.05282 1.735 0.08837 1 1:23289961 23289961 56 0.004241 0.002476 0.05152 1.713 0.09251 1 1:23327004 23327004 56 -0.001758 0.002224 0.01143 -0.7902 0.4328 1 1:23342555 23342555 55 0.003464 0.001687 0.0737 2.054 0.04497 1 1:23396183 23396183 56 0.002054 0.002069 0.01792 0.9927 0.3253 1 1:23402516 23402516 56 0.01103 0.002603 0.2494 4.236 8.911e-05 1 1:23472663 23472663 56 0.008913 0.003017 0.1392 2.954 0.004632 1 1:23541892 23541892 56 -0.003914 0.002447 0.04522 -1.599 0.1156 1 1:23555692 23555692 56 0.01103 0.002603 0.2494 4.236 8.911e-05 1 1:23615921 23615921 56 0.003649 0.001715 0.0773 2.127 0.03801 1 1:23641150 23641150 54 -0.004567 0.00162 0.1325 -2.818 0.006815 1 1:23682442 23682442 56 0.0003377 0.001867 0.0006053 0.1809 0.8572 1 1:23711792 23711792 56 -9.768e-05 0.001778 5.587e-05 -0.05493 0.9564 1 1:23812059 23812059 55 -0.002473 0.003811 0.007882 -0.6489 0.5192 1 1:23821149 23821149 56 0.004194 0.00295 0.03608 1.422 0.1609 1 1:23886718 23886718 56 0.001551 0.002557 0.006767 0.6065 0.5467 1 1:23962323 23962323 56 -0.001824 0.001807 0.01851 -1.009 0.3173 1 1:23974308 23974308 55 0.01197 0.00292 0.2407 4.099 0.000143

stephenturner commented 10 years ago

I see no problem. Run the following code (the part at the beginning just recapitulates your data.

d <- structure(list(CHR = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), SNP = c("1:23187542", 
"1:23217117", "1:23226619", "1:23280906", "1:23289961", "1:23327004", 
"1:23342555", "1:23396183", "1:23402516", "1:23472663", "1:23541892", 
"1:23555692", "1:23615921", "1:23641150", "1:23682442", "1:23711792", 
"1:23812059", "1:23821149", "1:23886718", "1:23962323", "1:23974308"
), BP = c(23187542L, 23217117L, 23226619L, 23280906L, 23289961L, 
23327004L, 23342555L, 23396183L, 23402516L, 23472663L, 23541892L, 
23555692L, 23615921L, 23641150L, 23682442L, 23711792L, 23812059L, 
23821149L, 23886718L, 23962323L, 23974308L), NMISS = c(56L, 56L, 
56L, 56L, 56L, 56L, 55L, 56L, 56L, 56L, 56L, 56L, 56L, 54L, 56L, 
56L, 55L, 56L, 56L, 56L, 55L), BETA = c(-0.00375, -0.0004373, 
0.0001129, 0.003209, 0.004241, -0.001758, 0.003464, 0.002054, 
0.01103, 0.008913, -0.003914, 0.01103, 0.003649, -0.004567, 0.0003377, 
-9.768e-05, -0.002473, 0.004194, 0.001551, -0.001824, 0.01197
), SE = c(0.003287, 0.001809, 0.001978, 0.001849, 0.002476, 0.002224, 
0.001687, 0.002069, 0.002603, 0.003017, 0.002447, 0.002603, 0.001715, 
0.00162, 0.001867, 0.001778, 0.003811, 0.00295, 0.002557, 0.001807, 
0.00292), R2 = c(0.02353, 0.001082, 6.03e-05, 0.05282, 0.05152, 
0.01143, 0.0737, 0.01792, 0.2494, 0.1392, 0.04522, 0.2494, 0.0773, 
0.1325, 0.0006053, 5.587e-05, 0.007882, 0.03608, 0.006767, 0.01851, 
0.2407), T = c(-1.141, -0.2418, 0.05706, 1.735, 1.713, -0.7902, 
2.054, 0.9927, 4.236, 2.954, -1.599, 4.236, 2.127, -2.818, 0.1809, 
-0.05493, -0.6489, 1.422, 0.6065, -1.009, 4.099), P = c(0.259, 
0.8098, 0.9547, 0.08837, 0.09251, 0.4328, 0.04497, 0.3253, 8.911e-05, 
0.004632, 0.1156, 8.911e-05, 0.03801, 0.006815, 0.8572, 0.9564, 
0.5192, 0.1609, 0.5467, 0.3173, 0.000143)), .Names = c("CHR", 
"SNP", "BP", "NMISS", "BETA", "SE", "R2", "T", "P"), row.names = c(NA, 
-21L), class = "data.frame")

d$SNP <- as.character(d$SNP)
mysnps <- c("1:23402516", "1:23555692")
library(qqman)
manhattan(d, highlight=mysnps)

rplot

If you're using the dev version you can limit the x-axis:

manhattan(d, highlight=mysnps, xlim=c(22187530, 24974320))

rplot01

isp2 commented 10 years ago

Yes, with the data you generated is working. But not with my files. Is there a way to send you my files for you to try them out?

isp2 commented 10 years ago

Solved it. It was an issue with processing the data before using it for plotting. Thank you!