theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
147 stars 24 forks source link

model comparison using LOO/WAIC #82

Closed hoeflerb closed 8 months ago

hoeflerb commented 1 year ago

Hi,

From the advanced tutorial documentation, scCODA appears to use the arviz InferenceData object to store the fitted model. This is handy and convenient because arviz provides a lot of useful utilities for generating model summary stats, diagnostic plots, etc. Unfortunately, the scCODA model output appears to not include the log likelihoods, so functions like arviz.waic() and arviz.loo() don't work.

> av.waic(sccoda_filteredl1celltypes_results)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[130], line 1
----> 1 av.waic(sccoda_filteredl1celltypes_results)

File /opt/homebrew/Caskroom/mambaforge/base/envs/scvi-env/lib/python3.10/site-packages/arviz/stats/stats.py:1613, in waic(data, pointwise, var_name, scale, dask_kwargs)
   1546 """Compute the widely applicable information criterion.
   1547 
   1548 Estimates the expected log pointwise predictive density (elpd) using WAIC. Also calculates the
   (...)
   1610        ...: data_waic.waic_i
   1611 """
   1612 inference_data = convert_to_inference_data(data)
-> 1613 log_likelihood = _get_log_likelihood(inference_data, var_name=var_name)
   1614 scale = rcParams["stats.ic_scale"] if scale is None else scale.lower()
   1615 pointwise = rcParams["stats.ic_pointwise"] if pointwise is None else pointwise

File /opt/homebrew/Caskroom/mambaforge/base/envs/scvi-env/lib/python3
/rviz/stats/stats_utils.py:425, in get_log_likelihood(idata, var_name)
    423     return idata.sample_stats.log_likelihood
    424 if not hasattr(idata, "log_likelihood"):
--> 425     raise TypeError("log likelihood not found in inference data object")
    426 if var_name is None:
    427     var_names = list(idata.log_likelihood.data_vars)

TypeError: log likelihood not found in inference data object

Is there a way to include these in the model output (maybe by a parameter to sample_hmc()) or to compute them independently?

Thanks!

hoeflerb commented 1 year ago

Just to clarify, the principal aim here is to do a nested model comparison. For example, to compare two models with and without an interaction term included.

johannesostner commented 1 year ago

Thanks for the inquiry, @hoeflerb!

We recently reimplemented scCODA as part of the pertpy package with improved functionality and API. This will also be the maintained version of scCODA in the future.

There, you can also create an arviz object from the inference result, which actually includes the log-likelihood. I just tried calculating the WAIC on our tutorial dataset, and it seems to work:

image