Documentation is a bit unclear

sriramlab / FAME

This is the software for FAst Marginal Epistasis test (FAME)

MIT License

2 stars 3 forks source link

Documentation is a bit unclear #1

Closed scienception closed 3 months ago

scienception commented 6 months ago

Hi! Could you please explain in more detail what these flags are for?

-gxgbin: the bin index to compute the ME effect -k: number of random vector -jn: number of jackknife blocks -annot: annotation file

Is the annotation file the one used in this paper to include "the carrier status at the target gene, and genotypes that potentially interact with the target feature" ? The paper mentions an extension beyond common variants as implemented in FAME. Not sure if this information is included in the annotation file or if this modified version of FAME has not been released yet.

Thank you! :)

FBoyang commented 6 months ago

Thanks for your interest in our software.

You are correct; it wasn't released. We just added the software we used for the paper you mentioned to the branch "extTarget". Please check the corresponding readme for more details.

To answer your question: -annot: the path to the annotation file. Intuitively, the annotation file guides you on partitioning the genotype matrix by features. we added more explanation to the README. Please check if the new description is clear.

-gxgbin: you need to provide an annotation file (MxK) with K bins across the M features. Here is the value of gxgbin corresponding to the column index of the annotation file. For example, if gxgbin=1, and the annotation file has 100 "1"s at column 2, then the ME effect of the target feature interacting with the 100 corresponding SNPs will be computed.

-k, -jn: you can put 100 for those two flags for simple analysis. More details can be found at: https://doi.org/10.1038/s41467-020-17576-9

scienception commented 4 months ago

Thanks for making this brach available.

I'm still a bit puzzled about which file corresponds to which.

In the paper you define a C_t variable like this:

Ct represents the target pseudo gene of interest and denotes whether individuals carry a burden of pathogenic variants at the target gene t defined as follows: Ct=0 if individual i does not carry any relevant pathogenic variants at target genes. and Ct=1 if individual i carries at least one pathogenic variant at the target gene(s) t.

Where exactly is this file? With which flag do you call it? I see a target.txt file but they seem to be continuous values.

Then multi-annot.txt is the one to divide G into G1 and G2 (close vs long-distant SNPs). Correct me if I'm wrong.

Finally gxgbin in the paper would be 2 since E_t=C_t*G_2? (You're measuring interaction between target and distal SNPs)?

FBoyang commented 3 months ago

Hi,

Sorry for the delayed reply. You are correct that it should be the target.txt file. In this synthetic setting it was continuous, but you can also provide the standardized binary file.

"Then multi-annot.txt is the one to divide G into G1 and G2 (close vs long-distant SNPs)" That's correct. We typically use multi-annot for the purpose of excluding LD block from the analysis. But you can also do other types of analysis.

" (You're measuring interaction between target and distal SNPs)?": That's also correct