rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
182 stars 53 forks source link

Burden/skato testing of rare pLOFs - testing pathways rather than genes? #466

Open ggstatgen opened 10 months ago

ggstatgen commented 10 months ago

Dear Joelle

We're using Regenie to perform standard gene-level analyses (burden, skato etc) from population-scale WES data.

We'd like to begin exploring testing entire pathways using the same method.

From a practical point of view, we would I guess just produce a set of .annotations and .setlist files where the chosen likely pathogenic variants are assigned to arbitrary pathway (rather than gene) names. This pathway info could be extracted from a number of sources of biological information.

The issue we're having is that most times any such pathways would be composed of genes on different chromosomes. The named 'pathway' entity would therefore appear in multiple .annotations and .setlist files. We have found this to be an issue: Regenie seems to skip testing this entity.

One way to get around this would I think be to merge the annotation files. Is there a less cumbersome method to seamlessly doing this without editing the structure of the annotation?

joellembatchou commented 10 months ago

Hi,

Why would you have multiple setlist files? The setname (column 1) would correspond to the pathway name, column 2 would have a chromosome value for the pathway (see my comment here on some limitations with genes across multiple chromosomes) and column 4 would have the list of variants belonging to genes in the pathway. As long as you have a unique pathway name then you can combine all the pathways in the same setlist file (as well as the variant annotations for each pathway in the annotation file [2 column would be pathway name instead of gene name]).

Please let me know if there is any confusion on the above.

Cheers, Joelle

ggstatgen commented 10 months ago

Hi Joelle

I'm running a per-chr analysis (UKBB) and I was under the impression that every per-chr Regenie phase II job would need to have its own .annotation and .setlist file related to the same chromosome.

If I understand correctly, you mean I could just merge those 22 .annotations in one pan-chr file and pass it to each job (same with the .setlist)?

Many thanks

joellembatchou commented 10 months ago

Yes you can have sets from different chromosomes in the same setlist and annotation files.

Kind regards, Joelle