saezlab / liana-py

LIANA+: an all-in-one framework for cell-cell communication
http://liana-py.readthedocs.io/
GNU General Public License v3.0
132 stars 15 forks source link

Bug on Differential Expression Analysis for CCC & Downstream Signalling Networks Vignette #117

Open maximelepetit opened 2 weeks ago

maximelepetit commented 2 weeks ago

Hello,

Thank you very much for maintaining and improving this package, it is extremely useful and interesting for our research.

I want to report a bug when running the differential analysis vignette.

At this steps after running deseq2 on pseudo-bulk profiles:

# concat results across cell types
dea_df = pd.concat(dea_results)
dea_df = dea_df.reset_index().rename(columns={'level_0': groupby}).set_index('index')
dea_df.head()

I have this error :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_1947289/4220832029.py in ?()
      1 # concat results across cell types
      2 dea_df = pd.concat(dea_results)
----> 3 dea_df = dea_df.reset_index().rename(columns={'level_0': groupby}).set_index('index')
      4 dea_df.head()

~/miniconda3/envs/liana-env/lib/python3.11/site-packages/pandas/core/frame.py in ?(self, keys, drop, append, inplace, verify_integrity)
   6102                     if not found:
   6103                         missing.append(col)
   6104 
   6105         if missing:
-> 6106             raise KeyError(f"None of {missing} are in the columns")
   6107 
   6108         if inplace:
   6109             frame = self

KeyError: "None of ['index'] are in the columns"

This error can be fix like this :

# concat results across cell types
dea_df = pd.concat(dea_results)
dea_df = dea_df.reset_index().rename(columns={'level_0': groupby,'level_1':'index'}).set_index('index')
dea_df.head()

The results output looks like this : dea_result_liana_nan

I noticed that NaN can be introduced on dea_df.

When remove them with :

# concat results across cell types
dea_df = pd.concat(dea_results)
dea_df = dea_df.reset_index().rename(columns={'level_0': groupby,'level_1':'index'}).set_index('index').dropna()
dea_df

It represent arounds 4000 rows.

dea_result_liana

Maxime

dbdimitrov commented 2 weeks ago

Hi @maximelepetit,

Thanks for the issue and for using liana. I will update this line the tutorial :)

Though, I'm really not sure why PyDESeq2 returns NaNs in those cases... I have opened an issue for this. Perhaps, I'm missing something but best to double check: https://github.com/owkin/PyDESeq2/issues/291

I would not do .dropna() in the tutorial by default, as it might accidentally hide some major issues.