lanagarmire / deepimpute

An accurate and efficient deep learning method for single-cell RNA-seq data imputation
MIT License
84 stars 27 forks source link

IndexError in fitting step #13

Closed muyuyang closed 4 years ago

muyuyang commented 4 years ago

Hi,

I'm using deepImpute for a scRNA-seq dataset with 21389 cells and 20499 genes. In the fitting step I always got this index error. I've tried choosing different number of cells and genes to impute but that didn't work. It got solved once when I specified n_pred to the number of genes. But in that case, all the model parameters learned were nan. So I assume this is not the correct way of solving the problem...

Could you please look into this issue? Thank you!

Using TensorFlow backend.
Fitting the model...
Input dataset is 21389 cells (rows) and 20499 genes (columns)
First 3 rows and columns:
                            100009600  100009609  100009614
45719_GSM1112514_SRR805197   0.000000   0.008907        0.0
45719_GSM1112529_SRR805212   0.044703   0.000000        0.0
45719_GSM1112532_SRR805215   0.000000   0.000000        0.0
14848 genes selected for imputation
Traceback (most recent call last):
  File "/Users/Documents/baseline.py", line 37, in <module>
    multinet.fit(train_X)
  File "/Users/Library/Python/3.7/lib/python/site-packages/deepimpute/multinet.py", line 202, in fit
    covariance_matrix = get_distance_matrix(raw, n_pred=n_pred)
  File "/Users/Library/Python/3.7/lib/python/site-packages/deepimpute/multinet.py", line 26, in get_distance_matrix
    potential_pred = raw.columns[VMR > 0]
  File "/Users/Library/Python/3.7/lib/python/site-packages/pandas/core/indexes/base.py", line 2095, in __getitem__
    result = getitem(key)
IndexError: boolean index did not match indexed array along dimension 0; dimension is 20499 but corresponding boolean dimension is 20463
Puumanamana commented 4 years ago

Hi,

Thanks for reporting the error. It should be fixed now. You can download the latest version and try it. However, it seems your gene expression data is normalized. If you can, you should provide raw counts instead.

Cédric

muyuyang commented 4 years ago

Thank you! It works now. The dataset I got is already normalized. Would it affect the results if I provide the normalized dataset as the input?

Puumanamana commented 4 years ago

If it's TPM or FPKM, it should be fine. However, make sure it's not log or sqrt transformed, this would not work.