dtype error #22

Open dwbl101 opened 10 months ago

dwbl101 commented 10 months ago

Hi! I have tried to run the peakachu score_genome function and my scripts were as follows: _peakachu score_genome -r 5000 --balance -p /workdir/nf/inter_30.hic -O nf-peakachu-5kb-scores.bedpe -m /model/high-confidence.600million.5kb.w6.pkl_

However, I got the error probably associated with the dtype:

``` /share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/sklearn/base.py:348: InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 1.1.2 when using version 1.3.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations warnings.warn( Traceback (most recent call last): File "/share/home/lhl_zhulin/miniconda3/envs/juicer/bin/peakachu", line 91, in run() File "/share/home/lhl_zhulin/miniconda3/envs/juicer/bin/peakachu", line 87, in run args.func(args) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/peakachu/score_genome.py", line 14, in main model = joblib.load(args.model) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 658, in load obj = _unpickle(fobj, filename, mmap_mode) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 577, in _unpickle obj = unpickler.load() File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/pickle.py", line 1212, in load dispatchkey[0] File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 402, in load_build Unpickler.load_build(self) File "/share/home/lhl_zhulin/miniconda3/envs/juicer/lib/python3.8/pickle.py", line 1705, in load_build setstate(state) File "sklearn/tree/_tree.pyx", line 728, in sklearn.tree._tree.Tree.setstate File "sklearn/tree/_tree.pyx", line 1432, in sklearn.tree._tree._check_node_ndarray ValueError: node array from the pickle has an incompatible dtype:

It seems to be a incompatible dtype error and my .hic file was produced by juicer1.6, I didn't think this file had a confused format. So how did this error happen? And What can I do to change its format to adapt the sofeware? Thank you very much!

tariks commented 10 months ago

Thank you for reporting this.

Based on the error message and traceback, my first guess for the source of this issue is that it is caused by differences in sklearn versions. My second guess is that something is causing the Tree setstate input to become malformed. My third guess is common python environment issues such as using the appropriate binary for your machine -- M1 macbook users know this pain.

In any case, the input expected by Tree's setstate is a dict with three keys: names, formats, and itemsize. What it got definitely looks like it came from a tree, but looks like a list of sets, not a dict.

I hope @XiaoTaoWang can think of a simple solution. I will take a look through recent changelogs related to pickle/sklearn if I get the time for it. But no promises.

Best next steps:

Thanks again

dwbl101 commented 10 months ago

Thanks for your considerable solutions! Here are my information of running conda list:

I installed peakachu by conda and I have been using the server based on linux, which was managed by slurm system. As you said, I am considering that maybe the python environment caused this problem and I will also create a new environment to try again. Thank you for your thoughtful consideration again!

tariks commented 10 months ago

No problem :)

Your output shows installs from several channels -- pypi, anaconda, etc. While conflicts between pip and conda are better managed now in the past, conda's dep management still gets confused sometimes. Usually something happens like this:

I make a conda env. I pip install something. pip upgrades matplotlib or whatever. conda gets confused. turns out something installed by conda breaks if matplotlib updates. Not sure if this is really what happens, but close enough.

The other issue is anaconda's default channel does not always have architecture-specific binaries for a package. The preferred channel is conda-forge, which is better maintained and more reliable. Your channel priority should be conda-forge > bioconda > default.

miniconda-forge on github explains things more. I like mamba / micromamba, but editing your regular miniconda config should work just as well.

To avoid most python env gotchas, take this advice: build your conda env all in one go, as in, specify all the libraries you need at creation time. Their dependencies will all get resolved together. Ideally, you never modify the env. If you need to install something new, prefer conda install over pip. If you pip install something, then use only pip after that.

We'll try this first and look for another solution if not resolved.

Good luck!

tariks commented 10 months ago

checking in, were you able to get things working?