Performs enrichment plots of GWAS p-value lists in DHSs (like Fig. 5 of Maurano, Humbert, et al. Science 2012)
cd src perl ./gwasVsRegions.pl -p pvalues.hg19.YOURSTUDY.bed5 -s ../hg19/namedFDR5pctHotspots.starch -r ../results_hotspots_nocoding
The GWAS P-value file is a tab-delimited file provided by the user. It includes the dbSNP ID in column 4, and the P-value in column 5; only the latter is actually used.
The two key scripts are in src. The pipeline is divided into a Perl script which processes the overlap, and then a second script which does the plotting in R:
namedFDR5pctHotspots.starch is a starch archive (see BEDOPS) containing the DHS master list.
Samples listed in excluded_samples.txt will be ignored
(This script was written by Eric Haugen, UW)
You can see that right now only the x-axis upper limit and of the number of cell types to label are parameterized on the command line. If you look inside, you'll see that the legendSamples list maps samples to group names and colors using regexp.
This could be easily parallelized by chromosome.
Key variables for each plot need to be optimized by the user: