tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

dtype error #22

Open dwbl101 opened 10 months ago

dwbl101 commented 10 months ago

Hi! I have tried to run the peakachu score_genome function and my scripts were as follows: _peakachu score_genome -r 5000 --balance -p /workdir/nf/inter_30.hic -O nf-peakachu-5kb-scores.bedpe -m /model/high-confidence.600million.5kb.w6.pkl_

However, I got the error probably associated with the dtype:

``` /share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/sklearn/base.py:348: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.1.2 when using version 1.3.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( Traceback (most recent call last): File "/share/home/lhl_zhulin/miniconda3/envs/juicer/bin/peakachu", line 91, in run() File "/share/home/lhl_zhulin/miniconda3/envs/juicer/bin/peakachu", line 87, in run args.func(args) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/peakachu/score_genome.py", line 14, in main model = joblib.load(args.model) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 658, in load obj = _unpickle(fobj, filename, mmap_mode) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle obj = unpickler.load() File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/pickle.py", line 1212, in load dispatchkey[0] File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 402, in load_build Unpickler.load_build(self) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/pickle.py", line 1705, in load_build setstate(state) File "sklearn/tree/_tree.pyx", line 728, in sklearn.tree._tree.Tree.setstate File "sklearn/tree/_tree.pyx", line 1432, in sklearn.tree._tree._check_node_ndarray ValueError: node array from the pickle has an incompatible dtype:

It seems to be a incompatible dtype error and my .hic file was produced by juicer1.6, I didn't think this file had a confused format. So how did this error happen? And What can I do to change its format to adapt the sofeware? Thank you very much!

tariks commented 10 months ago

Thank you for reporting this.

Based on the error message and traceback, my first guess for the source of this issue is that it is caused by differences in sklearn versions. My second guess is that something is causing the Tree setstate input to become malformed. My third guess is common python environment issues such as using the appropriate binary for your machine -- M1 macbook users know this pain.

In any case, the input expected by Tree's setstate is a dict with three keys: names, formats, and itemsize. What it got definitely looks like it came from a tree, but looks like a list of sets, not a dict.

I hope @XiaoTaoWang can think of a simple solution. I will take a look through recent changelogs related to pickle/sklearn if I get the time for it. But no promises.

Best next steps:

Thanks again

dwbl101 commented 10 months ago

Thanks for your considerable solutions! Here are my information of running conda list: `

# packages in environment at /share/home/lhl_zhulin/miniconda3/envs/juicer:
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
asciitree                 0.3.3                    pypi_0    pypi
bioframe                  0.5.0                    pypi_0    pypi
bwa                       0.7.17               h7132678_9    https://mirrors.bfsu.edu.cn/anaconda/cloud/bioconda
bwa-mem2                  2.2.1                hd03093a_2    https://mirrors.bfsu.edu.cn/anaconda/cloud/bioconda
bzip2                     1.0.8                h7b6447c_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
c-ares                    1.19.0               h5eee18b_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
ca-certificates           2023.7.22            hbcca054_0    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.3.0                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
contourpy                 1.1.1                    pypi_0    pypi
cooler                    0.9.3                    pypi_0    pypi
cooltools                 0.5.4                    pypi_0    pypi
curl                      7.88.1               h5eee18b_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
cycler                    0.12.1                   pypi_0    pypi
cython                    3.0.4                    pypi_0    pypi
cytoolz                   0.12.2                   pypi_0    pypi
dill                      0.3.7                    pypi_0    pypi
fonttools                 4.43.1                   pypi_0    pypi
gdbm                      1.18                 hd4cb3f1_4    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
h5py                      3.10.0                   pypi_0    pypi
hic-straw                 0.0.6                    pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.31.5                   pypi_0    pypi
importlib-metadata        6.8.0                    pypi_0    pypi
importlib-resources       6.1.0                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
krb5                      1.19.4               h568e23c_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
lazy-loader               0.3                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libcurl                   7.88.1               h91b91d3_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libedit                   3.1.20221030         h5eee18b_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libev                     4.33                 h7f8727e_1    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libffi                    3.3                  he6710b0_2    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libgcc-ng                 12.2.0              h65d4601_19    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
libnghttp2                1.46.0               hce63b2e_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libssh2                   1.10.0               h8f2d780_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
libstdcxx-ng              13.2.0               h7e041cc_2    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
libzlib                   1.2.13               h166bdaf_4    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
llvm-openmp               14.0.6               h9e868ea_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
llvmlite                  0.41.0                   pypi_0    pypi
matplotlib                3.7.3                    pypi_0    pypi
multiprocess              0.70.15                  pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
networkx                  3.1                      pypi_0    pypi
numba                     0.58.0                   pypi_0    pypi
numpy                     1.24.4                   pypi_0    pypi
openssl                   1.1.1w               hd590300_0    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
packaging                 23.2                     pypi_0    pypi
pandas                    1.5.3                    pypi_0    pypi
peakachu                  2.2.post1                pypi_0    pypi
perl                      5.34.0               h5eee18b_2    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
pillow                    10.1.0                   pypi_0    pypi
pip                       23.2.1           py38h06a4308_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
pybind11                  2.11.1           py38h7f3f72f_2    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
pybind11-global           2.11.1           py38h7f3f72f_2    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
pyfaidx                   0.7.2.2                  pypi_0    pypi
pyparsing                 3.1.1                    pypi_0    pypi
python                    3.8.8                hdb3f193_5    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
python-dateutil           2.8.2                    pypi_0    pypi
python_abi                3.8                      2_cp38    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge
pytz                      2023.3.post1             pypi_0    pypi
pywavelets                1.4.1                    pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
requests                  2.31.0                   pypi_0    pypi
samtools                  1.6                  hcd7b337_9    https://mirrors.bfsu.edu.cn/anaconda/cloud/bioconda
scikit-image              0.21.0                   pypi_0    pypi
scikit-learn              1.3.1                    pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
setuptools                68.0.0           py38h06a4308_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
simplejson                3.19.2                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
threadpoolctl             3.2.0                    pypi_0    pypi
tifffile                  2023.7.10                pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
toolz                     0.12.0                   pypi_0    pypi
typing-extensions         4.8.0                    pypi_0    pypi
urllib3                   2.0.7                    pypi_0    pypi
wheel                     0.41.2           py38h06a4308_0    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
xz                        5.2.10               h5eee18b_1    https://mirrors.bfsu.edu.cn/anaconda/pkgs/main
zipp                      3.17.0                   pypi_0    pypi
zlib                      1.2.13               h166bdaf_4    https://mirrors.bfsu.edu.cn/anaconda/cloud/conda-forge

I installed peakachu by conda and I have been using the server based on linux, which was managed by slurm system. As you said, I am considering that maybe the python environment caused this problem and I will also create a new environment to try again. Thank you for your thoughtful consideration again!

tariks commented 10 months ago

No problem :)

Your output shows installs from several channels -- pypi, anaconda, etc. While conflicts between pip and conda are better managed now in the past, conda's dep management still gets confused sometimes. Usually something happens like this:

I make a conda env. I pip install something. pip upgrades matplotlib or whatever. conda gets confused. turns out something installed by conda breaks if matplotlib updates. Not sure if this is really what happens, but close enough.

The other issue is anaconda's default channel does not always have architecture-specific binaries for a package. The preferred channel is conda-forge, which is better maintained and more reliable. Your channel priority should be conda-forge > bioconda > default.

miniconda-forge on github explains things more. I like mamba / micromamba, but editing your regular miniconda config should work just as well.

To avoid most python env gotchas, take this advice: build your conda env all in one go, as in, specify all the libraries you need at creation time. Their dependencies will all get resolved together. Ideally, you never modify the env. If you need to install something new, prefer conda install over pip. If you pip install something, then use only pip after that.

We'll try this first and look for another solution if not resolved.

Good luck!

tariks commented 10 months ago

checking in, were you able to get things working?