yollct / spycone

Splicing-aware time-course network enricher - exploratory analysis for transcriptomics and/or proteomics time series data
GNU General Public License v3.0
12 stars 0 forks source link

Installation error from biopython and not able to reproduce tutorial #3

Open LPChaumont opened 7 months ago

LPChaumont commented 7 months ago

Hi @yollct ,

I wanted to try spycone, but got some trouble during the installation process. I tried to install spycone in a virtual environment following the instruction in the repository:

python -m venv .spycone
source .spycone/bin/activate
python -m pip install ---upgrade pip
python -m pip install https://github.com/fraenkel-lab/pcst_fast/archive/refs/tags/1.0.7.tar.gz
python -m pip install spycone

I got the following warning and error message:

/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/tslearn/bases/bases.py:15: UserWarning: h5py not installed, hdf5 features will not be supported.
Install h5py to use hdf5 features: http://docs.h5py.org/
  warn(h5py_msg)
{
    "name": "ImportError",
    "message": "cannot import name 'GC' from 'Bio.SeqUtils' (/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/Bio/SeqUtils/__init__.py)",
    "stack": "---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import spycone as spy

File ~/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/spycone/__init__.py:11
      9 from .run_domino import run_domino, run_domain_domino
     10 from .DOMINO.src.core import domino
---> 11 from .splicingfactor import SF_coexpression, SF_motifsearch
     12 #from ._NEASE import nease

File ~/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/spycone/splicingfactor.py:14
     12 from scipy.stats import pearsonr
     13 from scipy.stats import mannwhitneyu, fisher_exact, kruskal
---> 14 from Bio.SeqUtils import GC
     15 from joblib import Parallel, delayed
     16 import gc

ImportError: cannot import name 'GC' from 'Bio.SeqUtils' (/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/Bio/SeqUtils/__init__.py)"
}

After doing some digging I found this github issue which also mention biopython#4622. I downgraded biopython 1.83 to 1.80 with python -m pip install biopython==1.80 and the error message is gone, but I still get the warning message about hdf5.

After that, I tried to reproduce the tutorial in your documentation and it didn't work. Both gene and transcript level workflow return the same error message. I stricly followed the documentation but when I run the code for spy.dataset(...) it returns this error:

{
    "name": "TypeError",
    "message": "dataset.__init__() got an unexpected keyword argument 'keytype'",
    "stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 flu_dset = spy.dataset(ts=flu_ts,
      2                         gene_id = gene_list,
      3                         symbs=gene_list,
      4                         species=9606,
      5                         keytype='entrezgeneid',
      6                         reps1 = 5,
      7                         timepts = 9)

TypeError: dataset.__init__() got an unexpected keyword argument 'keytype'"
}

I do not know what is wrong and any help would be appreciated! Spycone looks great, I would like to give it a try on my own data after that.

LP

LPChaumont commented 6 months ago

Maybe the documentation is not up to date with the code because the argument 'keytype' from this function in the gene-level workflow is not present in DataSet.py

flu_dset = spy.dataset(ts=flu_ts,
                        gene_id = gene_list,
                        symbs=gene_list,
                        species=9606,
                        keytype="entrezgeneid",
                        reps1 = 5,
                        timepts = 9)

When I removed the argument 'keytype' is seems to work until

c = asclu.find_clusters()
TypeError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 c = asclu.find_clusters()

File ~/.local/lib/python3.10/site-packages/spycone/clustering.py:154, in clustering.find_clusters(self)
    152 bestsim = -np.inf
    153 if self.algorithm == \"hierarchical\" and self.linkage != \"ward\":
--> 154     cluster = func(affinity=\"precomputed\", linkage=self.linkage, n_clusters=self.n_clusters).fit(dist)
    155     #sil = silhouette_score(dist, cluster.labels_, metric=\"precomputed\")
    157 elif self.algorithm == \"hierarchical\" and self.linkage==\"ward\":

TypeError: AgglomerativeClustering.__init__() got an unexpected keyword argument 'affinity'"
}

Have a good day,

LP

yollct commented 6 months ago

Hi LP,

Thank you very much for your interesting in Spycone. And sorry for the late reply.

First, thank you for noticing for the not-up-to-date documentation. You are right, I remove the keytype parameters. I will update it accordingly. As well as the biopython version, GC is no longer in the newest version of biopython.

For the clustering error, this is due to the newest version in scikit-learn where affinity parameter is removed. But since they are still changing this for the next version, I would suggest to downgrade scikit-learn to version <=1.2.2. I will adapt to their newest version when version 1.6 come out.

Best, Chit Tong