Closed Ojami closed 2 years ago
The approach we took in our paper is to fine-map each and every region that includes even one SNP with p-value < cutoff (regardless of where the SNP is found in the region), including overlapping regions. The reason is that for highly polygenic traits the concept of "lead SNP" is not well-defined. In other words, there could be many many causal SNPs across the genome, and choosing just one region to fine-map is too arbitrary to identify all of them. The results of fine-mapping are not accurate for SNPs at the edges of a fine-mapped region (because we ignore their LD-neighboring SNPs).
In practice we construct all 3Mb regions (starting 1Mb apart). However, the script drops regions that don't include even a single SNP with p-value < cutoff. More details in the paper :)
Makes sense! cheers!
Sorry for reopening this again, but I have kind of similar question for GW fine mapping, specially with aggregate_finemapper_results.py
when aggregating distant loci on the same chr. To explain this, I'm using biochemistry_Cholesterol.txt.gz from ployfun paper. If I sort it by chr and PIP:
CHR | BP | SNP | A1 | A2 | pip | causal_set |
---|---|---|---|---|---|---|
1 | 55505647 | rs11591147 | G | T | 1 | 1 |
1 | 55519174 | rs499883 | G | A | 1 | 3 |
1 | 220970095 | rs17850677 | T | A | 0.99919 | 2 |
1 | 55698134 | rs570635280 | C | T | 0.99425 | 2 |
1 | 220970028 | rs2642438 | A | G | 0.9824 | 1 |
1 | 55520864 | rs557435 | A | G | 0.88311 | 9 |
1 | 145827097 | rs71618972 | T | G | 0.83767 | -999 |
1 | 55521313 | rs472495 | G | T | 0.66015 | 5 |
Based on this table, I would (incorrectly) observe that rs17850677 and rs570635280 both belong to the same credible set, while they're not even LD friend. So, they belong to two different credible sets, and happened to have the same credible set index. This happens because aggregate_finemapper_results.py
aggregates all loci within the same chr into a single file, then it becomes difficult to group varinats based on their credible sets (as in this example). Of course this doesn't happen when user runs separate fine mapping per each loci.
Am I missing something or is this a bug with aggregate_finemapper_results.py
?
Thanks!
Hi @Ojami ,
Indeed, thanks for the bug report. Since we merge different fine-mapping outputs from different (possibly overlapping) regions, the ids of the credible sets are region-specific. To address this, I now updated the aggregate_finemapper_jobs script to explicitly list the region name in the credible set. Can you please git pull and try the latest version? I'll also update the wiki to explain this subtlety.
Perfect, thanks!
Hi,
Is there any specific reason why
create_finemapper_jobs.py
creates overlapping regions in--jobs-file
? For instance,head -2 polyfun_all_jobs.txt
give the same credible set. I assume this happens because some index signals (p <
--pvalue-cutoff
) within the region happen to be within [42000001, 45000001], while others fall within [43000001, 46000001]. Generally speaking, shouldn't the regions be mutually exclusive?Thanks/Oveis