rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
173 stars 52 forks source link

ERROR: No sets are left to be analyzed. in gene burden testing #112

Closed OmegaPetrazzini closed 3 years ago

OmegaPetrazzini commented 3 years ago

Hi, I am trying to run gene burden on LOF variants. I only have one mask so my mask_file looks: Mask1 LoF

And my annotation file looks: 1:69849:G:A OR4F5 exon LoF 1:69869:T:A OR4F5 exon LoF 1:925960:C:T SAMD11 exon LoF 1:925996:C:T SAMD11 exon LoF 1:930221:C:T SAMD11 exon LoF 1:931053:G:A SAMD11 exon LoF

I am getting this when running Options in effect: --step 2 \ --bed UKBexomeOQFE_chr5 \ --phenoFile phenotype_file.txt \ --covarFile covariates_file.txt \ --anno-file Anno_file.txt \ --set-list Set_list.txt \ --mask-def Mask_file.txt \ --pred fit_l1_pred.list \ --bsize 500 \ --out regenie_gene_burden_chr5

Association testing mode (joint tests) with fast multithreading using OpenMP

Looks like it's not finding the "LoF" annotation from the Anno_file.txt. Any help or suggestion will be very helpful.

Thank you!

joellembatchou commented 3 years ago

Hi,

It seems that the variants in the annotation file are not being recognized as present in the genotype file. Could you check if there are any special characters in the annotation file? (e.g. cat -A Anno_file.txt)?

seuchoi commented 3 years ago

I got the same error message.

OmegaPetrazzini commented 3 years ago

That was it thank you!

My position was in a different annotation. It was e.g. 1:69849:G:A and should have been e.g. chr1_69849_G_A.

Hope this helps seuchoi.

seuchoi commented 3 years ago

Oh.... got it..... I had the same issue. My file was e.g. 1:69849:G:A and should have been e.g. chr1:69849:G:A Thanks!!!!

Shicheng-Guo commented 3 years ago

I think I meet the same issue. There is a "$" sign at end of the line and it can be only seen with cat -A. I have tried sed and perl script to read and re-write, but the "$" sign are always there. Is there any suggestion to remove these "$" sign?

rs1490922004 OR2L8 PD4$
chr1_247948931_T_C OR2L8 PD1$
rs779314525 OR2L8 PD4$
chr1_247949502_T_G OR2L8 PD2$

Thanks.

Hi,

It seems that the variants in the annotation file are not being recognized as present in the genotype file. Could you check if there are any special characters in the annotation file? (e.g. cat -A Anno_file.txt)?

joellembatchou commented 3 years ago

The "$" is fine as it indicates the end of the line and so will be recognized in Regenie as separate from the annotation PD4,PD1,....