Hello, and thanks for making the PWCoCo resource available!
I've noticed something that I think may be a bug related to swapped SNP labels. Here is a toy example demonstrating the issue. It only runs PWCoCo on 3 SNPs, which we would obviously never do, but it recreates the problem I've observed in real runs of hundreds of variants. Coordinates are hg19:
Exposure:
1:27220521:T:G G T 0.1 -0.030001 0.003 0.0000000000000000000000001
1:27221963:T:C C T 0.1 -0.030002 0.003 0.0000000000000000000000001
1:27304568:A:G G A 0.7 0.015 0.002 0.00000000001
Outcome:
1:27220521:T:G G T 0.15 -0.021 0.01 0.05
1:27221963:T:C C T 0.15 -0.020 0.01 0.05
1:27304568:A:G G A 0.75 0.01 0.01 0.2
An excerpt from the log output is as follows:
[exposure] Total amount of SNPs matched from phenotype file with reference SNPs are: 3.
[exposure] Performing stepwise model selection on 3 SNPs; p cutoff = 5e-08, collinearity = 0.9 assuming complete LE between SNPs more than 10.0 Mb away).
[exposure] Selected SNP 1:27221963:T:C with chisq 100.01 and pval 1.51e-23.
[exposure] Selected entry SNP 1:27221963:T:C with cpval 9.22e-17.
[exposure] Selected entry SNP 1:27220521:T:G with cpval 2.00e+00.
[exposure] 1:27220521:T:G does not meet threshold
You can see that 1:27221963:T:C is selected as the first signal, which makes sense. But then it is also listed as the next 'Selected entry SNP' which doesn't make sense.
The .coloc and .cojo output files (below) indicate that the 2 conditionally independent SNPs selected by COJO are 1:27221963:T:C and 1:27304568:A:G. This makes sense, given that these 2 SNPs are not in LD (r2=0.007), and both have significant unconditioned p-values (the 3rd inputted SNP, 1:27220521:T:G, has near perfect LD with 1:27221963:T:C, so it makes sense it wouldn't be one of the independent signals).
Chr SNP bp refA freq b se p n freq_geno bC bC_se pC 1:27221963:T:C 1:27304568:A:G
1 1:27220521:T:G 27220521 G 0.1 -0.030001 0.003 1.51885e-23 617185 0.1504 -0.0318381 0.00300024 2.62499e-26 0.997871 0.00691254
1 1:27221963:T:C 27221963 C 0.1 -0.030002 0.003 1.51375e-23 617185 0.150307 -0.0318353 0.00300024 2.65143e-26 1 0.00688391
Based on this toy example, as well as occasional full scale analyses, I'm wondering if some sort of SNP label swapping is occurring (probably in the log file). In a larger-scale run (>100 variants), I have seen an example where the 2nd 'Selected entry SNP' (SNP2) is reported in the log file as significant after conditioning on the first selected SNP (SNP1), but then in the corresponding .cojo file that has conditioned only on the first selected SNP1, the pC for SNP2 is >0.05 (i.e., not significant), and a different SNP altogether (SNP3) has the smallest pC which is actually equal to the cpval shown in the log file for SNP2! Hence my thinking that perhaps SNP labels have been swapped somehow.
That is my main concern. Another thing that is pretty minor (I think) is that in the toy example above, the .cojo file results show that the pC value for 1:27220521:T:G is actually slightly smaller than the pC for 1:27221963:T:C, and yet the latter is selected as the lead SNP for that independent signal. I'm assuming this is just a rounding matter? These pC values are clearly not meaningfully different, I just want to be sure nothing unexpected is going on here.
Hello, and thanks for making the PWCoCo resource available!
I've noticed something that I think may be a bug related to swapped SNP labels. Here is a toy example demonstrating the issue. It only runs PWCoCo on 3 SNPs, which we would obviously never do, but it recreates the problem I've observed in real runs of hundreds of variants. Coordinates are hg19:
Exposure:
Outcome:
An excerpt from the log output is as follows:
You can see that 1:27221963:T:C is selected as the first signal, which makes sense. But then it is also listed as the next 'Selected entry SNP' which doesn't make sense.
The .coloc and .cojo output files (below) indicate that the 2 conditionally independent SNPs selected by COJO are 1:27221963:T:C and 1:27304568:A:G. This makes sense, given that these 2 SNPs are not in LD (r2=0.007), and both have significant unconditioned p-values (the 3rd inputted SNP, 1:27220521:T:G, has near perfect LD with 1:27221963:T:C, so it makes sense it wouldn't be one of the independent signals).
.coloc file
out.exposure.1:27221963:T:C.cojo
Based on this toy example, as well as occasional full scale analyses, I'm wondering if some sort of SNP label swapping is occurring (probably in the log file). In a larger-scale run (>100 variants), I have seen an example where the 2nd 'Selected entry SNP' (SNP2) is reported in the log file as significant after conditioning on the first selected SNP (SNP1), but then in the corresponding .cojo file that has conditioned only on the first selected SNP1, the pC for SNP2 is >0.05 (i.e., not significant), and a different SNP altogether (SNP3) has the smallest pC which is actually equal to the cpval shown in the log file for SNP2! Hence my thinking that perhaps SNP labels have been swapped somehow.
That is my main concern. Another thing that is pretty minor (I think) is that in the toy example above, the .cojo file results show that the pC value for 1:27220521:T:G is actually slightly smaller than the pC for 1:27221963:T:C, and yet the latter is selected as the lead SNP for that independent signal. I'm assuming this is just a rounding matter? These pC values are clearly not meaningfully different, I just want to be sure nothing unexpected is going on here.
Thank you!