jwr-git / pwcoco

Pair-wise conditional analysis and colocalisation
GNU General Public License v3.0
36 stars 4 forks source link

Clarification regarding which SNPs are ultimately conditioned on #13

Open astro-geno opened 5 months ago

astro-geno commented 5 months ago

The following relates to issue #9.

When PWCoCo identifies multiple independent signals (e.g., in the exposure dataset), it often occurs that the SNPs listed in the stepwise selection process in the log file ('Selected entry SNP...') are not the same as the conditionally independent SNPs listed in the .coloc file and in the .cojo file file/column names. I assume the reason for this is the lead SNPs identified in the .cojo filename (also included as columns within each .cojo file, and included in the .coloc file) are the result of jointly conditioning on all (minus one) stepwise-selected SNPs, which may result in a different lead SNP for the same independent signal as compared with the stepwise-selected SNP (shown in the log file) which hasn't necessarily controlled for all other signals. Is this correct?

Now my main question: In the joint conditional analyses that control for all other independent signals to yield the .cojo files, is it the stepwise-selected SNPs (from the log file) that are controlled for (as opposed to conditioning directly on the lead SNPs identified in the .coloc file and .cojo filenames)? Certain text on the PWCoCo Wiki explaining interpretation of the .coloc output seems to suggest that it is the lead SNPs (those in the .coloc output) that are directly being conditioned on: "So, when reading the row with rs669162, this shows that rs1063125, rs11265334 and rs617698 were all conditioned upon." Though my guess is that it's actually the stepwise-selected SNPs from the log file that are the ones being directly conditioned on in the final joint conditional analyses? And perhaps that quote from the Wiki doesn't necessarily mean that the lead SNPs rs1063125, rs11265334 and rs617698 were directly conditioned on, but that the signals they correspond to where controlled for (by directly conditioning on the stepwise-selected SNPs)?

Thanks for clarifying!