theislab / scgen

Single cell perturbation prediction
https://scgen.readthedocs.io
GNU General Public License v3.0
253 stars 51 forks source link

How to reproduce paper results #80

Open SZ-qing opened 1 year ago

SZ-qing commented 1 year ago

Hi,I am reproducing the results of your article, but I have encountered some environmental problems.

As described in https://github.com/theislab/scgen-reproducibility, I recreated a virtual environment called scgen-reproduce with packages tensorflow, scanpy, numpy, matplotlib, scipy, and wget. Then follow the prompts to cd to the code path and run the following command line: python ModelTrainer.py all [all datas have been downloaded ] , but an error is encountered:

_Traceback (most recent call last): File "", line 1, in File "/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code/scgen/init.py", line 19, in version = get_version(file) File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/get_version/init.py", line 280, in get_version raise NoVersionFound(Source.all, msg) get_version.NoVersionFound: No version found:

What I understand is that when reproducing the results, you don't have to install scgen yourself in the python environment, right?

SZ-qing commented 1 year ago

Or can you tell me the version of python, tensorflow you need?

SZ-qing commented 1 year ago

What was the anndata version of the h5ad file that you created? Or the scanpy version, because I can't read the h5ad file you provided right now. my packages version:

Package Version


absl-py 1.4.0 adjustText 0.8 aiohttp 3.8.4 aiosignal 1.3.1 anndata 0.8.0 astunparse 1.6.3 async-timeout 4.0.2 asynctest 0.13.0 cached-property 1.5.2 cachetools 5.3.0 certifi 2022.12.7 charset-normalizer 3.1.0 chex 0.1.5 cycler 0.11.0 dm-tree 0.1.8 docrep 0.3.2 dunamai 1.16.0 et-xmlfile 1.1.0 etils 0.9.0 exceptiongroup 1.1.1 flatbuffers 23.3.3 flax 0.6.4 fonttools 4.38.0 frozenlist 1.3.3 fsspec 2023.1.0 gast 0.4.0 get_version 3.5.4 google-auth 2.17.3 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 grpcio 1.54.0 h5py 3.8.0 idna 3.4 importlib-metadata 6.6.0 iniconfig 2.0.0 jax 0.3.25 jaxlib 0.3.25 joblib 1.2.0 keras 2.11.0 kiwisolver 1.4.4 libclang 16.0.0 llvmlite 0.39.1 Markdown 3.3.4 markdown-it-py 2.2.0 MarkupSafe 2.1.2 matplotlib 3.5.3 mdurl 0.1.2 ml-collections 0.1.1 msgpack 1.0.5 mudata 0.2.1 multidict 6.0.4 multipledispatch 0.6.0 natsort 8.3.1 networkx 2.6.3 numba 0.56.4 numpy 1.21.6 numpyro 0.10.1 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 oauthlib 3.2.2 openpyxl 3.1.2 opt-einsum 3.3.0 optax 0.1.4 orbax 0.1.0 packaging 23.1 pandas 1.3.5 patsy 0.5.3 Pillow 9.5.0 pip 23.1.2 pluggy 1.0.0 protobuf 3.19.6 pyasn1 0.5.0 pyasn1-modules 0.3.0 pyDeprecate 0.3.2 pynndescent 0.5.7 pyparsing 3.0.9 pyro-api 0.1.2 pyro-ppl 1.8.4 pytest 7.2.2 python-dateutil 2.8.2 pytz 2023.3 requests 2.30.0 requests-oauthlib 1.3.1 rich 13.3.3 rsa 4.9 scanpy 1.9.3 scikit-learn 1.0.2 scipy 1.7.3 seaborn 0.12.2 session-info 1.0.0 setuptools 67.7.2 six 1.16.0 statsmodels 0.13.5 stdlib-list 0.8.0 tensorboard 2.11.2 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.11.0 tensorflow-estimator 2.11.0 tensorflow-io-gcs-filesystem 0.32.0 tensorstore 0.1.28 termcolor 2.3.0 threadpoolctl 3.1.0 toolz 0.12.0 torch 1.13.1+cpu torchaudio 0.13.1+cpu torchmetrics 0.11.4 torchvision 0.10.0a0+e04d001.dtk2210 tqdm 4.65.0 typing_extensions 4.5.0 umap-learn 0.5.3 urllib3 2.0.2 Werkzeug 2.2.3 wget 3.2 wheel 0.40.0 wrapt 1.15.0 yarl 1.8.2 zipp 3.15.0

The error message is as follows: Traceback (most recent call last): File "./vec_arith_pca.py", line 144, in train("pbmc", "CD4T", "unbiased") File "./vec_arith_pca.py", line 119, in train ctrl_CD4T_PCA = pca.transform(adata_list[1].X) File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/decomposition/_base.py", line 117, in transform X = self._validate_data(X, dtype=[np.float64, np.float32], reset=False) File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/base.py", line 566, in _validate_data X = check_array(X, **check_params) File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/utils/validation.py", line 726, in check_array accept_large_sparse=accept_large_sparse, File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/sklearn/utils/validation.py", line 441, in _ensure_sparse_format "A sparse matrix was passed, but dense " TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

yikang613 commented 9 months ago

I met the same issue. What I did is just move the ModelTrainer.py to the main directory instead of the code directory. I think it could solve this problem.

twytock commented 7 months ago

Hi @SZ-qing,

Based on the error message, it looks like the function pca.transform is expecting a dense array, but the data array of the Anndata object is sparse by default. It looks like changing line 119 of vec_arith_pca.py from ctrl_CD4T_PCA = pca.transform(adata_list[1].X) to ctrl_CD4T_PCA = pca.transform(adata_list[1].X.toarray()) will fix this error.

@yikang613 : Did your solution work?

printfisnotgood commented 7 months ago

Hi,I am reproducing the results of your article, but I have encountered some environmental problems.

As described in https://github.com/theislab/scgen-reproducibility, I recreated a virtual environment called scgen-reproduce with packages tensorflow, scanpy, numpy, matplotlib, scipy, and wget. Then follow the prompts to cd to the code path and run the following command line: python ModelTrainer.py all [all datas have been downloaded ] , but an error is encountered:

_Traceback (most recent call last): File "", line 1, in File "/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code/scgen/init.py", line 19, in version = get_version(file) File "/public/home/nierq01/anaconda3/envs/scgen-reproduce/lib/python3.7/site-packages/get_version/init.py", line 280, in get_version raise NoVersionFound(Source.all, msg) get_version.NoVersionFound: No version found:

  • Directory name: name of directory “/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code” does not contain a valid version.
  • VCS: could not find VCS from directory “/public/home/nierq01/platform/scGen/reproduce/scgen-reproducibility-master/code”.
  • Package metadata: could not find distribution “scgen”._

What I understand is that when reproducing the results, you don't have to install scgen yourself in the python environment, right?

I think you could try this command pip install get_version==2.2 to get an old version of this module since the literature was published about four years ago. I ran into this problem too and this solution worked.