Closed gaow closed 6 years ago
I suggest instead operating directly on res, and we can just extend susie_in_CS to compute purity (if X or LD is provided) and to filter if purity_threshold is provided.
so susie_in_CS(res, LD_mat, purity_threshold)
maybe susie_get_CS
is better? verb (get) better than adjective (in)?
we could have a separate function purity(sets,LDmat) that does the work?
On Fri, Jun 8, 2018 at 12:18 PM, gaow notifications@github.com wrote:
Currently I compute "purity" of sets separately based on LD and apply that to susie sets (too bad I coded it in Python). We should implement it as a separate function here. What should it be like? eg,
susie_get_sets(susie_in_CS(res), LD_mat = NULL, threshold = 0.2)
Or,
susie_get_sets(susie_in_CS(res), X = NULL, threshold = 0.2)
and we compute LD mat?
When LD mat is NULL, we just get the position of the variables for each set and report them as an R list of sets. With LD_mat filter we additionally compute minimal pairwise LD and remove the sets that fails the threshold?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/ABt4xTYY3PW1u4qMLwEzM-8igxJ34j7Hks5t6rHbgaJpZM4UgqtC .
@stephens999 sorry I just cleaned up mostly my todo list on simulation so getting back to this one now. I think currently we use in_CS
to refer to a binary vector of length p
with 1
for a variable in CS. We did not provide clusters of the exact position of variables. That would be more like get_CS
.
How about we then make in_CS
as well as n_in_CS
not exported function, and implement get_CS
to return a list of actual CS? I guess in_CS
is not that useful.
BTW n_in_CS
can be trivially computed from get_CS
or in_CS
results so we still decide not to export it, right?
we could have a separate function purity(sets,LDmat) that does the work?
I agree it is cleaner to have a separate function for developers when we want to explore how "purity" works. But I think now that we have some idea about it through separate studies, then here from user's prospective having less functions is better. So I'm voting for:
susie_get_CS(res, X = NULL, R = NULL, threshold = 0.2)
X
or R
is provided we use LD filter. and maybe call it correlation
or r2
-- better not call it LD_mat
for context outside genetic fine-mapping. What's your pick? The default threshold 0.2 is chosen from our experience in genetic mapping studies.X
and R
are provided, throw error. Does it sound Okay?
I think correlation, X'X, is more natural to provide than squared correlation?
So maybe Xcor ?
Cool, Xcor
it is!
Currently I compute "purity" of sets separately based on LD and apply that to susie sets (too bad I coded it in Python). We should implement it as a separate function here. What should it be like? eg,
Or,
and we compute LD mat?
When LD mat is NULL, we just get the position of the variables for each set and report them as an R list of sets. With LD_mat filter we additionally compute minimal pairwise LD and remove the sets that fails the threshold?