snap-stanford / UCE

UCE is a zero-shot foundation model for single-cell gene expression data
MIT License
120 stars 15 forks source link

AnnData eval issue: cannot unpack non-iterable NoneType object #5

Closed viriditax closed 2 months ago

viriditax commented 6 months ago

Was trying a simple test locally on M2 MacBook, call:

python eval_single_anndata.py --adata_path chicken_heart.h5ad --dir res --species chicken

Using sample 4 layer model


chicken_heart.h5ad ERROR


Traceback (most recent call last): File "/scratch/UCE/eval_single_anndata.py", line 155, in main(args, accelerator) File "/scratch/UCE/eval_single_anndata.py", line 83, in main processor.preprocess_anndata() File "/scratch/UCE/evaluate.py", line 93, in preprocess_anndata self.adata, num_cells, num_genes = \ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: cannot unpack non-iterable NoneType object

This is after successful download of model:

Downloading ./model_files/species_chrom.csv from https://figshare.com/ndownloader/files/42706558 ...

100%|█████████████████████████████████████| 4.10M/4.10M [00:00<00:00, 4.27MiB/s] Downloading ./model_files/species_offsets.pkl from https://figshare.com/ndownloader/files/42706555 ...

100%|█████████████████████████████████████████| 139/139 [00:00<00:00, 22.3kiB/s] Downloading resmodel_files/protein_embeddings.tar.gz from https://figshare.com/ndownloader/files/42715213 ...

100%|█████████████████████████████████████| 2.74G/2.74G [02:19<00:00, 19.7MiB/s] Done! Downloading ./model_files/all_tokens.torch from https://figshare.com/ndownloader/files/42706585 ...

100%|█████████████████████████████████████| 2.98G/2.98G [02:31<00:00, 19.7MiB/s] Using sample 4 layer model Downloading ./model_files/4layer_model.torch from https://figshare.com/ndownloader/files/42706576 ...

100%|█████████████████████████████████████| 3.40G/3.40G [02:57<00:00, 19.2MiB/s]

Same issue with some previously prepared AnnData objects from human samples with species set to human.

Yanay1 commented 6 months ago

What does the command look like when you run using a human dataset?

The datasets uploaded are datasets with X_uce already filled.

I've uploaded a notebook that walks through how to embed new species like Chicken:

https://github.com/snap-stanford/UCE/blob/main/data_proc/Create%20New%20Species%20Files.ipynb

It might be better to try one of the other species like human first.

viriditax commented 6 months ago

python eval_single_anndata.py --adata_path scanpyscenic.h5ad --dir res --species human

which has these attributes:

AnnData object with n_obs × n_vars = 8806 × 32847
    obs: 'orig.ident', 'nCount_originalexp', 'nFeature_originalexp', 'sample', 'patient_id',  'anatomical_location', 'cell_type', 'cell_subtype', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'sizeFactor', 'originalexp_snn_res.0.5', 'seurat_clusters', 'ident'
    var: 'features'
    obsm: 'X_PCA', 'X_UMAP'
Yanay1 commented 6 months ago

What is the terminal output when you run that?

This can happen if there was an issue processing the anndata.

viriditax commented 6 months ago

Here it is:

$ python eval_single_anndata.py --adata_path chicken_heart.h5ad --dir res --species chicken
Using sample 4 layer model
**********************************
***********chicken_heart.h5ad ERROR***********
**********************************
Traceback (most recent call last):
  File "~/UCE/eval_single_anndata.py", line 155, in <module>
    main(args, accelerator)
  File "~/UCE/eval_single_anndata.py", line 83, in main
    processor.preprocess_anndata()
  File "~/UCE/evaluate.py", line 93, in preprocess_anndata
    self.adata, num_cells, num_genes = \
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

Same error for each AnnData object.

Yanay1 commented 6 months ago

What is the terminal output when you try to embed a human dataset?

viriditax commented 6 months ago

Tried "human_lung_disease-001.h5ad" and got the same error. Anndata package version is 0.10.3.

In case it's helpful:

accelerate 0.25.0 pypi_0 pypi accelerator 2023.11.3.dev1 pypi_0 pypi anndata 0.10.3 pypi_0 pypi array-api-compat 1.4 pypi_0 pypi bottle 0.12.25 pypi_0 pypi bzip2 1.0.8 h1de35cc_0 ca-certificates 2023.08.22 hecd8cb5_0 certifi 2023.11.17 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi contourpy 1.2.0 pypi_0 pypi cycler 0.12.1 pypi_0 pypi filelock 3.13.1 pypi_0 pypi fonttools 4.45.1 pypi_0 pypi fsspec 2023.10.0 pypi_0 pypi h5py 3.10.0 pypi_0 pypi huggingface-hub 0.19.4 pypi_0 pypi idna 3.6 pypi_0 pypi jinja2 3.1.2 pypi_0 pypi joblib 1.3.2 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi libffi 3.4.4 hecd8cb5_0 llvmlite 0.41.1 pypi_0 pypi markupsafe 2.1.3 pypi_0 pypi matplotlib 3.8.2 pypi_0 pypi mpmath 1.3.0 pypi_0 pypi natsort 8.4.0 pypi_0 pypi ncurses 6.4 hcec6c5f_0 networkx 3.2.1 pypi_0 pypi numba 0.58.1 pypi_0 pypi numpy 1.26.2 pypi_0 pypi openssl 3.0.12 hca72f7f_0 packaging 23.2 pypi_0 pypi pandas 2.1.3 pypi_0 pypi patsy 0.5.4 pypi_0 pypi pillow 10.1.0 pypi_0 pypi pip 23.3.1 py311hecd8cb5_0 psutil 5.9.6 pypi_0 pypi pynndescent 0.5.11 pypi_0 pypi pyparsing 3.1.1 pypi_0 pypi python 3.11.5 hf27a42d_0 python-dateutil 2.8.2 pypi_0 pypi pytz 2023.3.post1 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi readline 8.2 hca72f7f_0 requests 2.31.0 pypi_0 pypi safetensors 0.4.1 pypi_0 pypi scanpy 1.9.6 pypi_0 pypi scikit-learn 1.3.2 pypi_0 pypi scipy 1.11.4 pypi_0 pypi seaborn 0.12.2 pypi_0 pypi session-info 1.0.0 pypi_0 pypi setproctitle 1.3.3 pypi_0 pypi setuptools 68.0.0 py311hecd8cb5_0 six 1.16.0 pypi_0 pypi sqlite 3.41.2 h6c40b1e_0 statsmodels 0.14.0 pypi_0 pypi stdlib-list 0.10.0 pypi_0 pypi sympy 1.12 pypi_0 pypi threadpoolctl 3.2.0 pypi_0 pypi tk 8.6.12 h5d9f67b_0 torch 2.1.1 pypi_0 pypi tqdm 4.66.1 pypi_0 pypi typing-extensions 4.8.0 pypi_0 pypi tzdata 2023.3 pypi_0 pypi umap-learn 0.5.5 pypi_0 pypi urllib3 1.26.6 pypi_0 pypi waitress 2.1.2 pypi_0 pypi wheel 0.41.2 py311hecd8cb5_0 xz 5.4.2 h6c40b1e_0 zlib 1.2.13 h4dc903c_0

Yanay1 commented 6 months ago

Can you please post the full terminal output? There should be terminal output before the error. What are the gene names in .var_names? Are they gene names or ensembl IDs?

Are you able to run the example anndata?

yhr91 commented 6 months ago

Did you also just try python eval_single_anndata.py with no additional arguments? That downloads and runs the default h5ad and would establish if it's an environment issue or something else

viriditax commented 6 months ago

@yhr91 running python eval_single_anndata.py worked great, it pulled down the 10k_pbmcs_proc, loaded model ./model_files/4layer_model.torch, and wrote new Anndata output to ./10k_pbmcs_proc_uce_adata.h5ad.

Running same with chicken_heart.h5ad or one of my anndata objects (.var_names are in gene names, not ENSEMBL IDs) returns the "TypeError: cannot unpack non-iterable NoneType object". Full error follows:

(UCE) $ python eval_single_anndata.py --adata_path chicken_heart.h5ad --dir res --species chicken
Using sample 4 layer model
**********************************
***********chicken_heart.h5ad ERROR***********
**********************************
Traceback (most recent call last):
  File "~/scratch/UCE/eval_single_anndata.py", line 155, in <module>
    main(args, accelerator)
  File "~/scratch/UCE/eval_single_anndata.py", line 83, in main
    processor.preprocess_anndata()
  File "~/scratch/UCE/evaluate.py", line 93, in preprocess_anndata
    self.adata, num_cells, num_genes = \
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot unpack non-iterable NoneType object

anndata is 0.10.3 from pypi_0 build

yhr91 commented 6 months ago

Thanks for pointing this out. We made some updates to the code for running cross species data

If you follow the instruction in this notebook it should get everything set up correctly to run the script.

We're working on simplifying the interface a bit more for the cross species setting

viriditax commented 5 months ago

Thank you for the update but I'm getting the same eval_single_anndata.py error with being unable to unpack the object as mentioned above. The goal was to use this on a human dataset where I've verified that the AnnData works in scanpy and other tools, confirmed gene names are not ENSEMBL IDs, etc. I tried creating a fresh conda environment with the 4 layer model newly pulled down and hit the same error as above.

yhr91 commented 5 months ago

Thanks, can you share the exact command you are using for running evaluation on the human dataset

Yanay1 commented 5 months ago

Could you please try pulling the latest version of the repo and running the default command again? Thanks!

viriditax commented 5 months ago

I tried it again with a fresh conda env Python 3.11 and cloning the git repo but hit the following error on trying to run the pbmc10k example. Input command: python eval_single_anndata.py

Error obtained:

Using sample AnnData: 10k pbmcs dataset
Downloading ./data/10k_pbmcs_proc.h5ad from https://figshare.com/ndownloader/files/42706966 ...

100%|█████████████████████████████████████| 85.6M/85.6M [00:05<00:00, 16.8MiB/s]
Using sample 4 layer model
Downloading ./model_files/4layer_model.torch from https://figshare.com/ndownloader/files/42706576 ...

100%|█████████████████████████████████████| 3.40G/3.40G [02:48<00:00, 20.2MiB/s]
Proccessing 10k_pbmcs_proc
8029.0
10k_pbmcs_proc (11990, 10809)
Wrote Shapes Dict
10809
Max Code: 613
Traceback (most recent call last):
  File "/Users/no/scratch/UCE/UCE/eval_single_anndata.py", line 155, in <module>
    main(args, accelerator)
  File "/Users/no/scratch/UCE/UCE/eval_single_anndata.py", line 85, in main
    processor.run_evaluation()
  File "/Users/no/scratch/UCE/UCE/evaluate.py", line 145, in run_evaluation
    run_eval(self.adata, self.name, self.pe_idx_path, self.chroms_path,
  File "/Users/no/scratch/UCE/UCE/evaluate.py", line 206, in run_eval
    all_pe = get_ESM2_embeddings(args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/no/scratch/UCE/UCE/evaluate.py", line 151, in get_ESM2_embeddings
    all_pe = torch.load(args.token_file)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/no/miniconda3/envs/UCE2/lib/python3.11/site-packages/torch/serialization.py", line 993, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/no/miniconda3/envs/UCE2/lib/python3.11/site-packages/torch/serialization.py", line 447, in __init__
    super().__init__(torch._C.PyTorchFileReader(name_or_buffer))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Yanay1 commented 4 months ago

I'm not sure exactly what this issue might be. It could be that that file did not unzip or download properly? Maybe try downloading it again?