owkin / PyDESeq2

A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA.
https://pydeseq2.readthedocs.io/en/latest/
MIT License
580 stars 60 forks source link

Error while single/multi factor analysis #71

Closed bhavaygg closed 1 year ago

bhavaygg commented 1 year ago

Hi,

i am using this library on my custom data but I am running into this error when I am performing single/multifactor analysis. I have clusters of cells as the factors and because I have multiple clusters, I had one hot encoded them and input the dataframe as the design factors. This is the error I am receiving

----> [1](vscode-notebook-cell://wsl%2Bubuntu/mnt/c/Users//Desktop/work/slab/umap_project/custom_data/norm_compare.ipynb#X16sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) dds.deseq2()

File ~/anaconda3/envs/myenv/lib/python3.9/site-packages/pydeseq2/DeseqDataSet.py:224, in DeseqDataSet.deseq2(self)
    222 self.fit_size_factors()
    223 # Fit an independent negative binomial model per gene
--> 224 self.fit_genewise_dispersions()
    225 # Fit a parameterized trend curve for dispersions, of the form
    226 # math:: f(\mu) = \alpha_1/\mu + a_0
    227 self.fit_dispersion_trend()

File ~/anaconda3/envs/myenv/lib/python3.9/site-packages/pydeseq2/DeseqDataSet.py:266, in DeseqDataSet.fit_genewise_dispersions(self)
    263     self.fit_size_factors()
    265 # Finit init "method of moments" dispersion estimates
--> 266 self._fit_MoM_dispersions()
    268 # Exclude genes with all zeroes
    269 non_zero = ~(self.counts == 0).all()

File ~/anaconda3/envs/myenv/lib/python3.9/site-packages/pydeseq2/DeseqDataSet.py:633, in DeseqDataSet._fit_MoM_dispersions(self)
    630 if not hasattr(self, "size_factors"):
    631     self.fit_size_factors()
--> 633 rde = fit_rough_dispersions(self.counts, self.size_factors, self.design_matrix)
    634 mde = fit_moments_dispersions(self.counts, self.size_factors)
...
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input y contains NaN.

Could someone help me understand what might be wrong?

Morteza-M-Saber commented 1 year ago

Did you check for NAN in condition files as stated in the example?

We start by removing samples for which condition is NaN. If you are using

another dataset, do not forget to change "condition" for the column of clinical_df

you wish to use as a design factor in your analysis.

samples_to_keep = ~clinical_df.condition.isna() counts_df = counts_df.loc[samples_to_keep] clinical_df = clinical_df.loc[samples_to_keep]

BorisMuzellec commented 1 year ago

Hi @Chokerino,

It looks like you have NaNs in your data. As @Morteza-M-Saber suggested, those NaNs probably come from your design factors and / or design matrix. Could you check whether this is the case?