martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
109 stars 14 forks source link

Error when adata.obs_names can be interpreted as integers #52

Closed daxxio closed 1 year ago

daxxio commented 1 year ago

I recently ran scDRS on a new data set and received this error:

Performing scDRS group-analysis
`connectivities` not found in `adata.obsp`; run `sc.pp.neighbors` first
Traceback (most recent call last):
  File "/usr/local/bin/scdrs", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/usr/tools/scDRS/bin/scdrs", line 740, in <module>
    fire.Fire()
  File "/home/usr/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/usr/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/usr/tools/scDRS/bin/scdrs", line 683, in perform_downstream
    dict_df_res = scdrs.method.downstream_group_analysis(
  File "/home/usr/tools/scDRS/scdrs/method.py", line 775, in downstream_group_analysis
    {"fdr": multipletests(df_reg["pval"].values, method="fdr_bh")[1]},
  File "/home/usr/.local/lib/python3.10/site-packages/statsmodels/stats/multitest.py", line 147, in multipletests
    alphacSidak = 1 - np.power((1. - alphaf), 1./ntests)
ZeroDivisionError: float division by zero

I was puzzled by this as the same scDRS version ran fine on other data sets and I could see no issues with this data set. After some digging, the issue seemed to be that the cell names in adata.obs_names were simply '0', '1', '2', etc., which caused the import into df_fullscore.index to interpret them as integers, causing mismatches during the cell alignment steps.

martinjzhang commented 1 year ago

Hi @daxxio, thank you for reporting the issue. I was unable to locate the code that caused the issue based on your description. I will keep looking. In the meantime, a minimal reproducible example would be helpful. Alternatively, changing the cell names to less ambiguous strings (e.g., 'cell_0') may avoid the issue.

Martin

martinjzhang commented 1 year ago

Hi @daxxio

Just fixed the issue by converting cell names in df_score to strings. See commit 19da98301842333909c3e8a4c68b67e349784a74

I will later merge this to the master branch so that you can install the updated version using github.

Martin