omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
94 stars 22 forks source link

TypeError: __init__() got an unexpected keyword argument 'normalize' #182

Closed Y-Isaac closed 8 months ago

Y-Isaac commented 8 months ago

HI,

When I use polyfun.py to re-estimate per-SNP heritabilities via S-LDSC, there have a error:

[INFO] Reading summary statistics from /public/home/P202306/polyfun_test/summary/pheno1_munged.parquet ... [INFO] Read summary statistics for 13087844 SNPs. [INFO] Reading reference panel LD Score from /public/home/P202306/polyfun_test/output/pheno1/pheno1.[1-22] ... [INFO] Read reference panel LD Scores for 13156184 SNPs. [INFO] Reading regression weight LD Score from /public/home/P202306/polyfun_test/ldscore/weight/[1-22] ... [INFO] Read regression weight LD Scores for 13156184 SNPs. [INFO] After merging with reference panel LD, 13087844 SNPs remain. [INFO] After merging with regression SNP LD, 13087844 SNPs remain. [INFO] Removed 183 SNPs with chi^2 > 431.334 (13087661 SNPs remain) Traceback (most recent call last): File "/public/home/P202306/software/polyfun/polyfun.py", line 849, in polyfun_obj.polyfun_main(args) File "/public/home/P202306/software/polyfun/polyfun.py", line 780, in polyfun_main self.compute_h2_bins(args, constrain_range=True) File "/public/home/P202306/software/polyfun/polyfun.py", line 757, in compute_h2_bins self.run_ldsc(args, use_ridge=False, nn=True, evenodd_split=True, keep_large=False) File "/public/home/P202306/software/polyfun/polyfun.py", line 217, in run_ldsc hsqhat = regressions.Hsq(chisq, File "/public/home/P202306/software/polyfun/ldsc_polyfun/regressions.py", line 401, in init LD_Score_Regression.init(self, y, x, w, N, M, n_blocks, intercept=intercept, File "/public/home/P202306/software/polyfun/ldsc_polyfun/regressions.py", line 243, in init jknife = jk.LstsqJackknifeSlow(x, y, is_large_chi2, n_blocks, evenodd_split=evenodd_split, nn=True, chr_num=chr_num, nnls_exact=nnls_exact) File "/public/home/P202306/software/polyfun/ldsc_polyfun/jackknife.py", line 267, in init lasso = Lasso(alpha=1e-100, fit_intercept=False, normalize=False, precompute=xtx, positive=True, max_iter=10000, random_state=0) TypeError: init() got an unexpected keyword argument 'normalize'

And this is my code, if it's helpful:

python ~/software/polyfun/polyfun.py \ --compute-h2-bins \ --output-prefix /public/home/P202306/polyfun_test/output/pheno1/pheno1 \ --sumstats /public/home/P202306/polyfun_test/summary/pheno1_munged.parquet \ --w-ld-chr /public/home/P202306/polyfun_test/ldscore/weight/

jdblischak commented 8 months ago
lasso = Lasso(alpha=1e-100, fit_intercept=False, normalize=False, precompute=xtx, positive=True, max_iter=10000, random_state=0)
TypeError: init() got an unexpected keyword argument 'normalize'

It looks like the function Lasso no longer has the argument normalize. This is likely a version difference. Could you please try running your code in the locked conda env, polyfun.yml.lock, that is known to work with the polyfun scripts:

mamba create --name polyfun --file polyfun.yml.lock
conda activate polyfun
Y-Isaac commented 8 months ago

@jdblischak HI,

Following your advice, I configured the environment based on the file polyfun.yml.lock. Since mamba is not available on my server, I used the following command to create the environment, hoping it would serve the same purpose:

conda create --name polyfun-lock --file polyfun.yml.lock

Unfortunately, this method doesn't seem to have worked, as I encountered the same error message again. I tried checking the version of the sklearn package in both environments. In polyfun-lock, the version is 1.2.2, while in the polyfun environment, it's 1.3.2, yet they both produced the same error.

All in all, thank you very much for your help!

Y-Isaac commented 8 months ago

I attempted to remove the normalize=False parameter from lines 267 and 295 in the jackknife.py script (I'm not certain this was a reliable action, it was only a trial), and this time it worked. I reviewed the prior probability results for chromosome 22, which range from 7.19e-7 to 9.09e-9, from highest to lowest. Compared to the example file, this result seems to be normal.

I eagerly look forward to your guidance on what I should do next. I hope you have a pleasant day!

jdblischak commented 8 months ago

I investigated the argument normalize. Turns out it was deprecated in scikit-learn 1.0.0!

API Change : The parameter normalize of linear_model.LinearRegression is deprecated and will be removed in 1.2. Motivation for this deprecation: normalize parameter did not take any effect if fit_intercept was set to False and therefore was deemed confusing. The behavior of the deprecated LinearModel(normalize=True) can be reproduced with a Pipeline with LinearModel (where LinearModel is LinearRegression, Ridge, RidgeClassifier, RidgeCV or RidgeClassifierCV) as follows: make_pipeline(StandardScaler(with_mean=False), LinearModel()). The normalize parameter in LinearRegression was deprecated in #17743 by Maria Telenczuk and Alexandre Gramfort. Same for Ridge, RidgeClassifier, RidgeCV, and RidgeClassifierCV, in: #17772 by Maria Telenczuk and Alexandre Gramfort. Same for BayesianRidge, ARDRegression in: #17746 by Maria Telenczuk. Same for Lasso, LassoCV, ElasticNet, ElasticNetCV, MultiTaskLasso, MultiTaskLassoCV, MultiTaskElasticNet, MultiTaskElasticNetCV, in: #17785 by Maria Telenczuk and Alexandre Gramfort.

So now I am confused why this hasn't been caught before. We test polyfun.py --compute-h2-bins

https://github.com/omerwe/polyfun/blob/00afe717ed157411ab78c54f3ec180e77abf47f9/test_polyfun.py#L144-L146

Ah, it's because of the flag --nnls-exact used in the test. That bypasses the call to Lasso():

https://github.com/omerwe/polyfun/blob/00afe717ed157411ab78c54f3ec180e77abf47f9/ldsc_polyfun/jackknife.py#L261-L270

https://github.com/omerwe/polyfun/blob/00afe717ed157411ab78c54f3ec180e77abf47f9/ldsc_polyfun/jackknife.py#L290-L298

Y-Isaac commented 8 months ago

@jdblischak ohh, I get it, thanks for your help! Now I'm going to close this issue.

omerwe commented 8 months ago

@Y-Isaac thanks for flagging this! I've accepted pull request #183, so the problem should be fixed for everyone now (thanks @jdblischak!)