phbradley / conga

Clonotype Neighbor Graph Analysis
MIT License
79 stars 18 forks source link

weird GEX and TCR UMAPs #31

Closed coro1c closed 2 years ago

coro1c commented 2 years ago

Hello,

I recently installed CoNGA and tried the human PBMC dataset. If I plot the GEX and TCR UMAP I get weird maps that don't resemble the ones I get by using Google Colab Notebook on the same example data set. I know that weird GEX UMAPS and clusters were reported due to GEX PC components being dominated by individual genes (Update 2021-09.10). But I have a similar problem with the TCR UMAP. I checked the preprocess.py file but there the change/update from 2021-09-10 is already integrated. In addition, the plots I get for the GEX principal components and the TCRdist kernel principal components (adata.obsm['X_pca_gex']/adata.obsm['X_pca_tcr']) are similar to the ones I obtain using Google Colab Notebook. Do you know what the reason for this might be?

Thanks for your help! Marie

UMAPs_from_test_Dataset_human_PBMCs UMAPs_from_test_Dataset_human_PBMCs_GEX_PC_TCRdist_kernel_PC

phbradley commented 2 years ago

Hi Marie, thanks for trying CoNGA! I agree, those are very weird UMAPs! I have never seen ones like that, and off the top of my head I can't think why that might be happening. Can you confirm which dataset you are analyzing, and whether you are using the command line run_conga.py script or a jupyter notebook? ALso maybe provide some version information for scanpy and related packages: this would be written out by the run_conga.py script.

coro1c commented 2 years ago

Thanks for the fast reply. Yes, I am using the same dataset. I doublechecked the download link. And if I plot the PCAs I get the same result as in the Colab Google notebook. I also get the same outputs for all previous steps. I am currently using Jupyter Notebook. I am using the scanpy version 1.7.2 (I know that there is a newer version in the Colab Google notebook but I didn't manage to upgrade scanpy). The other packages are: anndata 0.7.8 scanpy 1.7.2 sinfo 0.3.4

PIL 8.3.2 cached_property 1.5.2 cairocffi 1.2.0 cffi 1.14.6 cycler 0.10.0 dateutil 2.8.2 defusedxml 0.7.1 get_version 2.1 h5py 3.1.0 igraph 0.9.1 joblib 1.1.0 kiwisolver 1.3.1 legacy_api_wrap 1.2 leidenalg 0.8.4 llvmlite 0.36.0 louvain 0.7.0 matplotlib 3.3.4 natsort 8.0.0 numba 0.53.1 numexpr 2.7.3 numpy 1.19.5 packaging 21.0 pandas 1.1.5 pyparsing 3.0.5 pytz 2021.3 scipy 1.5.3 six 1.16.0 sklearn 0.24.2 tables 3.6.1 texttable 1.6.4 wcwidth 0.2.5 yaml 5.4.1

phbradley commented 2 years ago

Great that's helpful! Actually I was asking for you to post here the details on which exact dataset you are looking at (filename, download). We've looked at a lot of different 10x PBMC datasets so I want to make sure I am on the same page. Have you tried the run_conga.py command line script? That would generate a log file that we could look at to diagnose. Bottom line, this is going to be very hard to troubleshoot without more information. Perhaps you could post or email me the jupyter notebook that let to the funny umaps (pbradley@fredhutch.org)?

phbradley commented 2 years ago

I just heard from another user seeing something similar (also using scanpy 1.7.2). Any chance you could run this command to get a bit more version information (I don't see a umap version up there, for example)

In [6]: import scanpy as sc
In [7]: sc.logging.print_header()
scanpy==1.6.1 anndata==0.7.5 umap==0.4.6 numpy==1.19.5 scipy==1.5.3 pandas==1.1.5 scikit-learn==0.24.1 statsmodels==0.12.1 python-igraph==0.8.3 louvain==0.7.0 leidenalg==0.8.3

Also your python version? THanks for your help figuring this out!

phbradley commented 2 years ago

Hi there, It looks like this is a problem with scanpy version 1.7.2, see the issue below. I would try updating to scanpy version 1.8.2.

https://github.com/theislab/scanpy/issues/2045

Teerapon789 commented 2 years ago

Hi, I also got this problem. I upgraded scanty from 1.7.2 to 1.8.2 and the problem's really solved.

coro1c commented 2 years ago

Hi, thanks for all the help. I try updating scanpy to version 1.8.2

coro1c commented 2 years ago

Hi, upgrading scanpy resolved the issue. Thanks!

AlicePsyche commented 2 years ago

Hi Marie,

I have the same issue. And I checked the scanpy version, it is indeed 1.7.2. But when I tried to upgrade it, neither pip install scanpy==1.8.2 nor pip install --upgrade scanpy worked for me. It said

ERROR: Could not find a version that satisfies the requirement scanpy==1.8.2 (from versions: 0.2.1, 0.2.3, 0.2.3.4, 0.2.3.5, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.9.1, 0.3, 0.3.1, 0.3.2, 0.4, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 1.0, 1.0.1, 1.0.1.post1, 1.0.2, 1.0.3, 1.0.3.post1, 1.0.4, 1.1a1, 1.1a2, 1.1, 1.2.0, 1.2.1, 1.2.2, 1.3, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.8, 1.4, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.4.post1, 1.4.5, 1.4.5.post1, 1.4.5.post2, 1.4.5.post3, 1.4.5.1, 1.4.6, 1.5.0a1, 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0rc1, 1.7.0, 1.7.1, 1.7.2) ERROR: No matching distribution found for scanpy==1.8.2

Could you please show me how you did it? Thanks!

coro1c commented 2 years ago

Hi Alice,

In my case, the problem was that I had a Python version in my virtual environment that was too old and not compatible with scanpy 1.8.2.

To solve this, I created a new virtual environment and installed CoNGA again. I changed the python version in the command that is given to create the virtual environment for CoNGA to python 3.9 (conda create -n conga_new_env ipython python=3.9 ). Python 3.10 didn't work for me as I got the following error if I tried to install scanpy: “ResolvedPackageNotFound: python 3.1” → this seems to be due to difficulties in the library directionary structure (https://stackoverflow.com/questions/69481608/cannot-set-up-a-conda-environment-with-python-3-10, https://github.com/conda/conda/issues/10969). So I sticked with the python version 3.9 but with this CoNGA works perfectly fine.

Then I simply followed the instructions given to install CoNGA.

AlicePsyche commented 2 years ago

Thank you!!! It worked. Yea I have the same issue with you. Older version of python. Five days passed, I finally ran CoNGA on my own datasets....Many thanks!