omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
96 stars 22 forks source link

Run PolyFun with L2-regularized S-LDSC warnings #91

Closed gbloeb closed 2 years ago

gbloeb commented 2 years ago

Not sure if these warnings are connected or separate:

[WARNING]  R[write to console]: In cluster.1d.dp(x, k, y = 1, method, estimate.k, "L1", deparse(substitute(x)),  :
[WARNING]  R[write to console]:

[WARNING]  R[write to console]:  Max number of clusters used. Consider increasing k!

1) Any advice for how much to increase k? 2) Is the first warning connected to the second one?

Full log below:

[INFO]  Reading summary statistics from /wynton/protected/home/reiter/gloeb/group/polyfun/sumstats/munged_hg19_egfr_creat_cys_polyfunVariantmatched.parquet ...
[INFO]  Reading summary statistics from /wynton/protected/home/reiter/gloeb/group/polyfun/sumstats/munged_hg19_egfr_creat_cys_polyfunVariantmatched.parquet ...
[INFO]  Read summary statistics for 15082583 SNPs.
[INFO]  Reading reference panel LD Score from /wynton/protected/home/reiter/gloeb/group/polyfun/baselineLF2.2.UKB/baselineLF2.2.UKB.[1-22] ...
[INFO]  Read reference panel LD Scores for 19386297 SNPs.
[INFO]  Reading regression weight LD Score from /wynton/protected/home/reiter/gloeb/group/polyfun/baselineLF2.2.UKB/weights.UKB.[1-22] ...
[INFO]  Read regression weight LD Scores for 18275613 SNPs.
[INFO]  After merging with reference panel LD, 15054077 SNPs remain.
[INFO]  After merging with regression SNP LD, 14642807 SNPs remain.
[INFO]  Removed 1116 SNPs with chi^2 > 358.326 (14641691 SNPs remain)
[INFO]  iterating over chromosomes to compute XTX, XTy...
[INFO]  Evaluating Ridge lambdas...
[INFO]  Selected ridge lambda: 5.8888e-02 (68/100)  score: 9.1015e-02  score lstsq: 9.0280e-02
[INFO]  Estimating annotation coefficients for each chromosomes set
[INFO]  Computing per-SNP h^2 for each chromosome...
[INFO]  Saving constrained SNP variances to disk
[INFO]  Reading summary statistics from /wynton/protected/home/reiter/gloeb/group/polyfun/sumstats/munged_hg19_egfr_creat_cys_polyfunVariantmatched.parquet ...
[INFO]  Read summary statistics for 15082583 SNPs.
[INFO]  Reading reference panel LD Score from /wynton/protected/home/reiter/gloeb/group/polyfun/baselineLF2.2.UKB/baselineLF2.2.UKB.[1-22] ...
[INFO]  Read reference panel LD Scores for 19386297 SNPs.
[INFO]  Reading regression weight LD Score from /wynton/protected/home/reiter/gloeb/group/polyfun/baselineLF2.2.UKB/weights.UKB.[1-22] ...
[INFO]  Read regression weight LD Scores for 18275613 SNPs.
[INFO]  After merging with reference panel LD, 15054077 SNPs remain.
[INFO]  After merging with regression SNP LD, 14642807 SNPs remain.
[INFO]  Removed 1116 SNPs with chi^2 > 358.326 (14641691 SNPs remain)
[INFO]  iterating over chromosomes to compute XTX, XTy...
[INFO]  Evaluating Ridge lambdas...
[INFO]  Selected ridge lambda: 5.8888e-02 (68/100)  score: 9.1015e-02  score lstsq: 9.0280e-02
[INFO]  Estimating annotation coefficients for each chromosomes set
[INFO]  Computing per-SNP h^2 for each chromosome...
[INFO]  Saving constrained SNP variances to disk
[INFO]  Saving SNP variances to disk
[INFO]  Clustering SNPs into bins using the R Ckmeans.1d.dp package
[INFO]  Determining the optimal number of bins (if this is slow, consider using --num-bins 20 (or some other number))
[INFO]  Ckmedian.1d.dp partitioned SNPs into 30 bins
[INFO]  Saving SNP-bins to disk
[WARNING]  R[write to console]: Warning message:

[WARNING]  R[write to console]: In cluster.1d.dp(x, k, y = 1, method, estimate.k, "L1", deparse(substitute(x)),  :
[WARNING]  R[write to console]:

[WARNING]  R[write to console]:  Max number of clusters used. Consider increasing k!
omerwe commented 2 years ago

Hi, I haven't seen this warning, but I suspect it's not critical. Maybe it indicates that your data is heterogenous, so the optimal number of clusters is large. You can try setting the number of bins manually using e.g. --num-bins 40, but I don't think it's important enough to get into this. 30 bins should be a big enough number for most purposes.