xihaoli / STAAR

An R package for performing STAAR procedure in whole-genome sequencing studies
GNU General Public License v3.0
90 stars 42 forks source link

Error when running the Gene_Centric_Noncoding function #18

Closed ryshi06 closed 1 month ago

ryshi06 commented 1 month ago

Hi there, I followed the entire pipeline to generate the aGDS file and fit the null model without a kinship matrix (by assigning kins = NULL) and everything went smoothly. However, when I tried to run the non-coding centric analysis, it gives me the error saying Error in STAAR(Geno, obj_nullmodel, Anno.Int.PHRED.sub, rare_maf_cutoff = rare_maf_cutoff, : genotype is not a matrix!. I am not sure why this occurs if all above steps ran smoothly. Could you please help me on this, thanks!

`# run model for (arrayid in 1:22){

gene number in job

gene_num_in_array <- 50 group.num.allchr <- ceiling(table(genes_info[,2])/gene_num_in_array) sum(group.num.allchr)

chr <- which.max(arrayid <= cumsum(group.num.allchr)) group.num <- group.num.allchr[chr]

if (chr == 1){ groupid <- arrayid }else{ groupid <- arrayid - cumsum(group.num.allchr)[chr-1] }

genes_info_chr <- genes_info[genes_info[,2]==chr,] sub_seq_num <- dim(genes_info_chr)[1]

if(groupid < group.num) { sub_seq_id <- ((groupid - 1)gene_num_in_array + 1):(groupidgene_num_in_array) }else { sub_seq_id <- ((groupid - 1)*gene_num_in_array + 1):sub_seq_num }

exclude large noncoding masks

jobid_exclude <- c(21,39,44,45,46,53,55,83,88,103,114,127,135,150,154,155,163,164,166,180,189,195,200,233,280,285,295,313,318,319,324,327,363,44,45,54) sub_seq_id_exclude <- c(1009,1929,182,214,270,626,741,894,83,51,611,385,771,493,671,702,238,297,388,352,13,303,600,170,554,207,724,755,1048,319,324,44,411,195,236,677)

for(i in 1:length(jobid_exclude)) { if(arrayid==jobid_exclude[i]) { sub_seq_id <- setdiff(sub_seq_id,sub_seq_id_exclude[i]) } }

aGDS file

agds.path <- agds_dir[chr] genofile <- seqOpen(agds.path)

results_noncoding <- c() for(kk in sub_seq_id) { print(kk) gene_name <- genes_info_chr[kk,1] results <- Gene_Centric_Noncoding(chr=chr,gene_name=gene_name,genofile=genofile,obj_nullmodel=obj_nullmodel, rare_maf_cutoff=0.05,rv_num_cutoff=2, QC_label=QC_label,variant_type=variant_type,geno_missing_imputation=geno_missing_imputation, Annotation_dir=Annotation_dir,Annotation_name_catalog=Annotation_name_catalog, Use_annotation_weights=Use_annotation_weights,Annotation_name=Annotation_name) results_noncoding <- append(results_noncoding,results) }

save(results_noncoding,file=paste0(output_path,output_filename,"",arrayid,".Rdata"))

seqClose(genofile) } `

xihaoli commented 1 month ago

Hi @ryshi06,

Thanks for checking. These messages are not errors from your scripts, but they serve as indications of certain functional categories of genes that do not have at least 2 rare variants to form a variant set. You can proceed normally.

Best, Xihao

ryshi06 commented 1 month ago

Hello Dr. Li,

Thanks for your timely responses! I have another quick question, my aGDS files have more samples than the QC-passed phenotype file, will the program run the analysis based on overlapped samples only?

Best, Ruyu

xihaoli commented 1 month ago

Hi @ryshi06,

No problem. Typically, one would run the analysis for samples with both phenotype and genotype data available. If you would like to include more samples, then some "missing data" handling would be needed, and this paper could be an option.

Best, Xihao

ryshi06 commented 1 month ago

Thank you!

xihaoli commented 1 month ago

Sure, I'll close this issue for now, but please feel free to reopen it if you have any other questions.