probabilities contain NaN in numpy.random.mtrand.RandomState.choice #33

Closed mcrewcow closed 2 months ago

mcrewcow commented 3 months ago

Hi, thank you for the package developed!

I have an issue running the tool on my dataset. It was mainly built in Seurat and later converted to .h5ad with SeuratDisk. First there was an error about the _index column, fixed with:

fetal_total.__dict__['_raw'].__dict__['_var'] = fetal_total.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

Now, with

python --adata_path=/mnt/c/Bioinf/HUMAN_FETAL_RETINA/COMBINED_EKPB_v1_clean_int_indexed.h5ad --dir=/mnt/c/Bioinf/HUMAN_FETAL_RETINA/ --species=human

I receive the following error. The output is provided:

[2024-03-30 03:33:45,762] [INFO] [] Setting ds_accelerator to cuda (auto detect)
Using sample 4 layer model
Proccessing COMBINED_EKPB_v1_clean_int_indexed
COMBINED_EKPB_v1_clean_int_indexed (113073, 1861)
Wrote Shapes Dict
Max Code: 612
Loaded model:
  0%|                                                                                                                                                     | 0/4523 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "", line 155, in <module>
    main(args, accelerator)
  File "", line 85, in main
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 146, in run_evaluation
    self.starts_path, shapes_dict, self.accelerator, self.args)
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 235, in run_eval
    for batch in pbar:
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/tqdm/", line 1195, in __iter__
    for obj in iterable:
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/accelerate/", line 377, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/", line 628, in __next__
    data = self._next_data()
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/", line 671, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 65, in __getitem__
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 128, in sample_cell_sentences
  File "mtrand.pyx", line 935, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN

Thank you for your help!

Yanay1 commented 2 months ago

This error usually happens when you have cells with no genes expressed. Please double check that the .X slot contains count values, and that all cells have gene expression.

mcrewcow commented 2 months ago

Hi Yanay1,

I checked the object, it does have both .X and .raw.X slots. I can do all the downstream analysis upon h5seurat to h5ad conversion in scanpy, scvelo, scenic+, etc. I have also noticed that if I convert the 'integrated' assay of Seurat object to .h5ad, then it is the error I described above. Yet if I convert the 'RNA' assay, I get the following:

python --adata_path=/mnt/c/Bioinf/HUMAN_FETAL_RETINA/COMBINED_EKPB_v1_clean_RNA_indexed.h5ad --dir=/mnt/c/Bioinf/HUMAN_FETAL_RETINA/ --species=human --multi_gpu=True
[2024-04-01 15:54:41,230] [INFO] [] Setting ds_accelerator to cuda (auto detect)
Using sample 4 layer model
Proccessing COMBINED_EKPB_v1_clean_RNA_indexed
Yanay1 commented 2 months ago

What is the result of

min(np.sum(fetal_int.X, axis=0))

Try deleting all the intermediate files created by UCE and then re running

mcrewcow commented 2 months ago

Unfortunately, deleting the intermediate did not help.

This is the output of the command: -2559.4630599647617

Yanay1 commented 2 months ago

You cannot have negative numbers in .X. The expression values are used as probability weights. They should be count values.

mcrewcow commented 2 months ago

Oh, I have found that it is the issue of SeuratDisk conversion. The counts are written in .raw.X. So I have transferred them to .X now, it looks like real counts. I have also filtered the genes with min_counts = 40. Still, I keep getting

python --adata_path=/mnt/c/Bioinf/HUMAN_FETAL_RETINA/COMBINED_EKPB_v1_clean_countsonly_new_indexed_int_maybe.h5ad --dir=/mnt/c/Bioinf/HUMAN_FETAL
_RETINA/ --species=human
[2024-04-01 18:16:22,542] [INFO] [] Setting ds_accelerator to cuda (auto detect)
Using sample 4 layer model
Proccessing COMBINED_EKPB_v1_clean_countsonly_new_indexed_int_maybe
COMBINED_EKPB_v1_clean_countsonly_new_indexed_int_maybe (113073, 1861)
Wrote Shapes Dict
Max Code: 612
Loaded model:
  0%|                                                                                          | 0/4523 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "", line 155, in <module>
    main(args, accelerator)
  File "", line 85, in main
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 146, in run_evaluation
    self.starts_path, shapes_dict, self.accelerator, self.args)
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 235, in run_eval
    for batch in pbar:
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/tqdm/", line 1195, in __iter__
    for obj in iterable:
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/accelerate/", line 377, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/", line 628, in __next__
    data = self._next_data()
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/", line 671, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/mcrewcow/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 65, in __getitem__
  File "/mnt/c/Users/rodri/Downloads/UCE-main/UCE-main/", line 128, in sample_cell_sentences
  File "mtrand.pyx", line 935, in numpy.random.mtrand.RandomState.choice
ValueError: probabilities contain NaN
Yanay1 commented 2 months ago

I am not sure what the error could be besides some cell either containing negative numbers, zero counts, or NaNs. If you want, you can email me a copy of the anndata and I can inspect it. My email is (first name) @

It seems the issue happens in the first batch.
