xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
21 stars 17 forks source link

fitnullmodel error #12

Closed DamienTan closed 1 year ago

DamienTan commented 1 year ago

Dear Dr. Li, I am facing another problem when I ran fitnullmodel step using STAARpipeline_Null_Model_GENESIS.r, there was an error:


Error in .checkSampleId(cov.mat, x) : all sample names in dimnames of cov.mat must be present in x$sample.id Calls: fitNullModel -> fitNullModel -> .local -> .checkSampleId Execution halted


the cov.mat points sGRM.RData, so I check the content of the sGRM.Rdata. I find the sGRM@Dimnames[[1]] output looks strange:


"D2109086378_D2109086378" "D2109086379_D2109086379" "D2109086380_D2109086380"


It looks paste FID and IID using "_" as the rownames and colnames of the sparse GRM. Not sure if I understand correctly? After I did an operation to delect the "_" and the string after using sGRM@Dimnames[[1]] <- gsub("_D[0-9]+", "", sGRM@Dimnames[[1]]), I ran fitnullmodel again, but It still occurred the same error.

for "x", I think it is the phenotype I am concerned. Is it ringht? Meanwhile, I am not sure the phenotype file should include what columns? Here are the colnames in my phenotype file:


FID IID sex age first_transfusion PLT PLT_zscore pc1 pc2 pc3 pc4 pc5 pc6 pc7 pc8 pc9 pc10 pc11 pc12 pc13 pc14 pc15 pc16 pc17 pc18 pc19 pc20


the first two columns are FID and IID and they are indentical, Not sure it should remove one of them and rename the colname of the remain to sample.id?

xihaoli commented 1 year ago

Hi Damien,

Thanks for your question. Regarding the rownames/colnames of the sparse GRM, please add the following script before fitting the null model:

sample_id <- unlist(lapply(strsplit(colnames(sGRM),"_"),`[[`,1))
colnames(sGRM) <- sample_id
rownames(sGRM) <- sample_id

Regarding the fitNullModel() function, please refer to the GENESIS package manual for detailed specifications. Alternatively, you may follow the STAARpipeline_Null_Model.r scripts to fit the null model for STAARpipeline.

Best, Xihao

DamienTan commented 1 year ago

Hi Damien,

Thanks for your question. Regarding the rownames/colnames of the sparse GRM, please add the following script before fitting the null model:

sample_id <- unlist(lapply(strsplit(colnames(sGRM),"_"),`[[`,1))
colnames(sGRM) <- sample_id
rownames(sGRM) <- sample_id

Regarding the fitNullModel() function, please refer to the GENESIS package manual for detailed specifications. Alternatively, you may follow the STAARpipeline_Null_Model.r scripts to fit the null model for STAARpipeline.

Best, Xihao

Very appreciative of your help. I found that some of sample ids in my data are different, such as DT2007037898-1_DT2007037898-1. They could not be split by using gsub("_D[0-9]+", "", sGRM@Dimnames[[1]]). But it works by using your scripts. And I have done this fitnullmodel step already. Following the STAARpipeline-tutorial, I ran the individual analysis. When I ran the command Rscript STAARpipeline_Individual_Analysis.r 14, the output file size is very small. That was strange and maybe there was something wrong.


4.0K -rw-rw-r--. 1 cxm cxm 90 Dec 9 20:09 plt_results_individual_analysis_14.Rdata


BTW, I still could NOT understand the arrayid #7. I ran analysis in DELL PowerEdge T640 tower server, without any cluster. What I should do to adapt the script? Hope to have your generous help and thanks again for your previous help.

xihaoli commented 1 year ago

Hi Damien,

Glad that your previous question has been resolved. For your follow-up question, could you please paste the header and dimension of plt_results_individual_analysis_14.Rdata? I can provide more details by looking at the output. Thanks.

Best, Xihao

DamienTan commented 1 year ago

Hi Damien,

Glad that your previous question has been resolved. For your follow-up question, could you please paste the header and dimension of plt_results_individual_analysis_14.Rdata? I can provide more details by looking at the output. Thanks.

Best, Xihao

It is NULL in plt_results_individual_analysis_14.Rdata. Here are the other output files using Rscript STAARpipeline_Individual_Analysis.r 13 and Rscript STAARpipeline_Individual_Analysis.r 15: individual_analysis_13 individual_analysis_15 I think the problem maybe is the incorrect use of the Rscript STAARpipeline_Individual_Analysis.r <int number> due to my misunderstanding. In the screenshots above, I found that the second column contained only chromesome 1. But to my mind, it should be contained chromsome 13 or 15 which was consistent with the command line argument.

xihaoli commented 1 year ago

Hi Damian,

Thanks for sharing and now I get it. For individual variant analysis, the number of output files is the summation of the column "individual_analysis_num" for the object in jobs_num.Rdata. In our example, we submitted a total of 293 jobs in the SLURM cluster. For your case, the <int number> in your command Rscript STAARpipeline_Individual_Analysis.r <int number> does not represent the chromosome number but the job number.

Therefore, you would need to run the scripts for a total of sum(jobs_num$individual_analysis_num) times instead of 22 times by specifying Rscript STAARpipeline_Individual_Analysis.r 1, Rscript STAARpipeline_Individual_Analysis.r 2, etc. Finally, it is possible for your command Rscript STAARpipeline_Individual_Analysis.r 14 to give a NULL result since we are splitting the job based on physical positions, and it is possible for a particular region in chromosome 1 that does not contain any variant.

Hope this is clear.

Best, Xihao

DamienTan commented 1 year ago

Gotcha, I ran sum(jobs_num$individual_analysis_num) = 294 just now. That means I should run commands from Rscript STAARpipeline_Individual_Analysis.r 1 to Rscript STAARpipeline_Individual_Analysis.r 294. Maybe I could write a for loop in a shell script to run this step.

xihaoli commented 1 year ago

Yes, that is correct. Glad that you've got the point.

DamienTan commented 1 year ago

Yes, that is correct. Glad that you've got the point.

Thanks again. I will try to finish this step, and there may be other problems in the next steps. Hope to ask you for help again.