I am interested in running REGENIE on many (on the order of ~20K) quantitative phenotypes. For now, I am just interested in a simple linear regression model (step 2). I am interested in running this as one job, and not as 20K separate jobs (hence my draw toward REGENIE over other algorithms). The code that I have written for this is as follows:
When I run this, there are several errors that pop up, indicating fully empty columns (e.g. ERROR: all individuals have missing/invalid values for phenotype 'S-007-162_A0A075B6S9'.). So I go ahead and add a --phenoExcludeList to the parameters that takes out all columns that have 1) all phenotypes missing and 2) less than 1 unique value.
However, after doing this preprocessing, different errors pop up:
* no step 1 predictions given. Simple linear regression will be performed
-residualizing and scaling phenotypes...ERROR: phenotype 'S-229-079_Q9Y6Z7' has sd=0.
When I look at the raw data, this phenotype doesn't have a SD of zero - it has an SD of 0.3. Is something happening when I am inputting all these 20K phenotypes, where some phenotypes are getting "standardized out"? Is there a way to control the standardization process? Thanks in advance.
The sd is computed after projecting out covariates; the error message indicates that after projecting out covariates (ir taking residuals), the phenotype "S-229-079_Q9Y6Z7" has sd=0.
Hello,
I am interested in running REGENIE on many (on the order of ~20K) quantitative phenotypes. For now, I am just interested in a simple linear regression model (step 2). I am interested in running this as one job, and not as 20K separate jobs (hence my draw toward REGENIE over other algorithms). The code that I have written for this is as follows:
When I run this, there are several errors that pop up, indicating fully empty columns (e.g.
ERROR: all individuals have missing/invalid values for phenotype 'S-007-162_A0A075B6S9'.
). So I go ahead and add a--phenoExcludeList
to the parameters that takes out all columns that have 1) all phenotypes missing and 2) less than 1 unique value.However, after doing this preprocessing, different errors pop up:
When I look at the raw data, this phenotype doesn't have a SD of zero - it has an SD of 0.3. Is something happening when I am inputting all these 20K phenotypes, where some phenotypes are getting "standardized out"? Is there a way to control the standardization process? Thanks in advance.