xihaoli / STAARpipeline

An R package for performing association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using STAARpipeline
GNU General Public License v3.0
58 stars 18 forks source link

Null output in Dynamic window analysis #2

Closed SheaCheng2000 closed 1 year ago

SheaCheng2000 commented 1 year ago

Hi,

Recently I am using your pipeline on Docker to do burden test with dynamic window analysis.

After generating aGDS file, I conducted step0-step1-step5. I tested with WES data of 6 people (2 trios) with a chr1 region. Here is my original vcf file, generated aGDS file and commands: dynamic_window_test.zip

As I only tested one chromosome, I changed some lines of codes:

#### Number of jobs for each chromosome
jobs_num <- matrix(rep(0,3),nrow=1)
for(chr in 1:1)
{
    print(chr)
    gds.path <- agds_dir[1] # agds_dir[1]
    genofile <- seqOpen(gds.path)

    filter <- seqGetData(genofile, QC_label)
    SNVlist <- filter == "PASS" 

    position <- as.numeric(seqGetData(genofile, "position"))
    position_SNV <- position[SNVlist]

    jobs_num[chr,1] <- chr
    jobs_num[chr,2] <- min(position[SNVlist])
    jobs_num[chr,3] <- max(position[SNVlist])

    seqClose(genofile)
}

About groupid and arraryid in step 0, I directly used groupid = arraryid = scang_num.

Though no errors were reported,the output was null. I am not sure if these changes were correct. Have no idea which step went wrong. Could you give me some advice?

Thanks!

xihaoli commented 1 year ago

Hi Shea,

One possible reason for the null result is the low MAF cut-off. The default MAF cut-off is 0.05 in the SCANG procedure, which means we filter out the variants with MAF > 0.05. In your case, the same size is 6, and the lowest MAF is 1 / (2 * 6) = 0.083 > 0.05, indicating that filter out all variants. Hence you got the null results. One possible solution is to set a larger MAF cut-off.

Thank you very much.

Best, Xihao

alohasiqi commented 1 year ago

@SheaCheng2000 Hello Shea,

Just realized we had the same issue with the null/small number of outputs in dynamic window analyses. See https://github.com/xihaoli/STAARpipeline-Tutorial/issues/20

I'm wondering if you had successfully solved this with the modified MAF cutoff or other methods. Thanks!

SheaCheng2000 commented 1 year ago

@alohasiqi Hi alohasiqi,

Thanks for your concern!

I think the problem was indeed caused by the small sample size. So I changed to use the toy example. See https://github.com/xihaoli/STAARpipeline-Tutorial/issues/8. It ran successfully and gave the output of scang_result. Then I used the in-house dataset with 100 samples, and also got the output.

Shea

alohasiqi commented 1 year ago

Thanks for the feedback @SheaCheng2000! It's good to hear you had some results (maybe not a lot of significant ones due to the small sample size). I was worried initially about some "errors" in my implementation as only a small number of outputs were generated but it looks like that's only something about the power.