isayevlab / pKa-ANI

Accurate prediction of protein pKa with representation learning
Other
40 stars 10 forks source link

update scikit-learn dependency #8

Closed lpravda closed 4 months ago

lpravda commented 4 months ago

Hello developers, the version of scikit-learn required for pKa-ANI to run is rather old (1.0.2). The present version is 1.4.2 with 1.5.0 on the way. I wonder if there is any plan on updating that dependency at all. If I run pKa-ANI with the latest scikit-learn I'm getting the following exception:


/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/sklearn/base.py:376: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeRegressor from version 1.0.2 when using version 1.4.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
  warnings.warn(
Traceback (most recent call last):
  File "/Users/lpravda/mambaforge/envs/fresh/bin/pkaani", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/pkaani/run.py", line 111, in main
    pkadict=calculate_pka(pdbfiles,writefile=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/pkaani/pkaani.py", line 30, in calculate_pka
    asp_model=joblib.load(os.path.join(os.path.dirname(__file__),'models/ASP_ani2x_FINAL_MODEL_F100.joblib'))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/joblib/numpy_pickle.py", line 658, in load
    obj = _unpickle(fobj, filename, mmap_mode)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle
    obj = unpickler.load()
          ^^^^^^^^^^^^^^^^
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/pickle.py", line 1213, in load
    dispatch[key[0]](self)
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/site-packages/joblib/numpy_pickle.py", line 402, in load_build
    Unpickler.load_build(self)
  File "/Users/lpravda/mambaforge/envs/fresh/lib/python3.11/pickle.py", line 1718, in load_build
    setstate(state)
  File "sklearn/tree/_tree.pyx", line 865, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1571, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]
- ```

Thank you!
HGokcan commented 4 months ago

Hello. The current pkaani models are trained with the older version of scikit which is why this specific version is the dependency. In pkaani scikit is simply used to load the models so when another version is used it will cause errors. When future pkaani models are trained, latest version of scikit is going to be used.

lpravda commented 4 months ago

Hi @HGokcan, I completely understand, thanks a lot. This however limits pkani to be used with python up-to 3.10. Is there any time plan for the new version of pka-ani with updated models is released?

isayev commented 4 months ago

@lpravda, thanks for reporting. We will look into updating the model for the latest scikit-learn & Py.