Closed Hepit closed 6 months ago
Hi @Hepit,
Thanks for reaching out. MetaSTAAR_merge_simulation()
is specifically designed for performing simulation studies in the MetaSTAAR paper. It has the same core structure as MetaSTAAR_merge()
but was simplified so that it could directly take the loaded sumstat.list
and cov.list
objects as input.
If you are running a meta-analysis from real data, please feel free to contact us via email and we can share more details.
Best, Xihao
@xihaoli Thanks for the quick response ! Until I need to share data, I might keep the conversation here, if that is ok with you, just in case other people have the same questions I do.
Understood about MetaSTAAR_merge_simulation()
. In that case, let me formulate my question differently : what is the sequence of functions that should be called if one is trying to use this R package to perform meta-analysis?
Currently, I have run MetaSTAAR_worker_cov()
& MetaSTAAR_worker_sumstat()
on my variantset for each of my two cohorts; I then saved the 4 resulting R objects as .rda files (using save()
.) My understanding is that the next step would be to pass them to MetaSTAAR_merge()
; is this correct? If so, how can I do this, given that there is no filename argument in MetaSTAAR_merge()
? If not, what would be the next step?
Thanks a lot for your time and advice !
Hi @Hepit,
Sure no problem. In this case, let me ask a quick question. For MetaSTAAR_worker_cov()
, how did you specify the input parameter of region_midpos
? In other words, is the output of the function (sparse weighted LD matrix) a square (symmetric) matrix or a rectangle matrix? Knowing this would be more informative in answering your question.
Best, Xihao
Hi @xihaoli,
I wasn't sure exactly what was wanted for that parameter, so the value I used was simply the median position of the variant set - but if there's a better way to go about it, I'm very happy to redo it.
I took a look at the .rda cov file, and the matrix is rectangular.
Hi @Hepit,
Thanks for letting me know. MetaSTAAR has several general use cases:
(1) generate the cov files for all possible variants across the genome, so that meta-analysis of any variant set can be default after the cov files are shared. In this case, the cov files are rectangular matrices and region_midpos
should be the midpoint position of the longer arm of the rectangle;
(2) generate the cov files for pre-defined variant sets, so that meta-analysis is performed for the underlying variant sets. In this case, the cov files are square matrices. Note that this is a special case of (1) by setting region_midpos
in MetaSTAAR_worker_cov()
to be the position of the last variant in the set.
To clarify, MetaSTAAR_merge()
assumes that the cov files are rectangular, generated from the general case (1) above; whereas MetaSTAAR_merge_simulation()
assumes that the cov files are square, generated from the case (2) above. If you have already had your variant set defined and genotype extracted, you can use MetaSTAAR_worker_cov()
to generate the square cov matrices by setting region_midpos
to be the last variant's position (see the file type1_sim_cluster_2kb_arrayjob_binary_1e9_N=50000_meta.R
as an example), and then run MetaSTAAR_merge_simulation()
for your meta-analysis.
Hope this helps.
Best, Xihao
Hi @xihaoli,
Thanks, that does clarify a few things ! Still a few points of confusion though :
In case (1), what does the "rectangle" (and its "longer arm") refer to? If I'm passing an entire chromosome to MetaSTAAR_worker_cov()
, is the midpoint simply the position in the middle of the chromosome?
Also regarding case (1) : does this mean I can generate a single cov file for the whole genome, and define my variant sets of interest later? If so, when (in which function) do I define them? And should I still use MetaSTAAR_merge_simulation()
if I go this route?
Sorry for all these questions, and thanks a lot for your patience in answering them !
Hi @Hepit,
The "rectangle" (and its "longer arm") refers to the illustration in Panel b of the Workflow Figure of the MetaSTAAR paper. The segment.size
parameter in the MetaSTAAR_worker_cov()
function specifies the chunk size. The midpoint would then be the midpoint position of the segment, which can also be illustrated in the workflow figure.
Yes, you can generate a single cov file for the whole genome, but it also depends on how large your dataset is. In practice, you can generate cov matrices with sufficiently large bandwidth (segment.size
) so that they are large enough to cover all variant sets afterward. We have been preparing a follow-up package that utilizes the MetaSTAAR
package as a building block and automates all the processes from generating summary stat / cov files to defining the variant sets and performing meta-analysis. The package is in the final testing stage, and I can add you to the repository so that you can make use of the full functionality there.
Let me know your plan.
Best, Xihao
Hi @xihaoli,
Thanks for the explanation, that clarifies so much !
I'd love to be added to the repo. Thanks again !
Hello,
I've generated sumstat & cov files for my two cohorts using
MetaSTAAR_worker_cov()
&MetaSTAAR_worker_sumstat()
, but I'm now unsure how to feed those files toMetaSTAAR_merge()
, which seems to be the next step in the pipeline. The main point of my confusion is thatMetaSTAAR_merge()
has no argument where I can specify the name under which I saved the sumstat & cov files; it only has arguments where I can specify the directory. (For the record, I'm using version 0.9.6.3 of the R package.)I also notice that the scripts that accompany your 2023 article don't seem to make use of
MetaSTAAR_merge()
at all, and rather use a customMetaSTAAR_merge_simulation()
function instead...Could you shed some light on how these functions are supposed to be used?
Thanks !