nsheff / LOLA

Locus Overlap Analysis: Enrichment of Genomic Ranges
http://code.databio.org/LOLA
72 stars 19 forks source link

Merging database region-sets #18

Open flopflip opened 7 years ago

flopflip commented 7 years ago

Is there a simple programmatic way to merge two regionsets from regionDB into a new comboned set within regionDB?

This workaround I can think of seems complicated:

  1. extract the regions and merge them, save as new bed file
  2. create custom database with new bed file
  3. import the new database and merge with exisitng

For example, if I want enrichment for regions that are DNAse Weak OR UCSC CpG islands, would have to extract the two sets from redionDM (or bed files) and create a new bed file, etc.

Seems like there should be a simpler way?

nsheff commented 7 years ago

Interesting. I have never tried to do an analysis like this. There is no simple way to do it, your method seems to be the best to me at the moment.

Do you have a suggestion for how such a thing could work more easily?

The only other approach I can think of is to just do your enrichments on the region sets while separated, but then combine their support, b, c, d values as you want and re-run the fisher's test on the combination. It wouldn't be exactly right due to multiple overlaps, though (some things could get counted twice). It may still work depending on the exact datasets.

nsheff commented 7 years ago

Ok, seems possible. I'll keep this on the back burner and next time I work on LOLA I'll take a look at it.