miicTeam / miic_R_package

Learning causal or non-causal graphical models using information theory
GNU General Public License v3.0
26 stars 3 forks source link

MIIC

CRAN
Status R build
status

This repository contains the source code for MIIC (Multivariate Information-based Inductive Causation), a causal discovery method, based on information theory principles, which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The recent more interpretable MIIC extension (iMIIC) further distinguishes genuine causes from putative and latent causal effects, while scaling to very large datasets (hundreds of thousands of samples). Since the version 2.0, MIIC also includes a temporal mode (tMIIC) to learn temporal causal graphs from stationary time series data. MIIC has been applied to a wide range of biological and biomedical data, such as single cell gene expression data, genomic alterations in tumors, live-cell time-lapse imaging data (CausalXtract), as well as medical records of patients. MIIC brings unique insights based on causal interpretation and could be used in a broad range of other data science domains (technology, climatology, economy, ...).

References

Simon F., Comes M. C., Tocci T., Dupuis L., Cabeli V., Lagrange N., Mencattini A., Parrini M. C., Martinelli E., Isambert H., CausalXtract: a flexible pipeline to extract causal effects from live-cell time-lapse imaging data, eLife 2024.

Ribeiro-Dantas M. D. C., Li H., Cabeli V., Dupuis L., Simon F., Hettal L., Hamy A. S., Isambert H., Learning interpretable causal networks from very large datasets, application to 400,000 medical records of breast cancer patients, iScience, 2024.

Cabeli V., Li H., Ribeiro-Dantas M., Simon F., Isambert H., Reliable causal discovery based on mutual information supremum principle for finite dataset, Why21 at NeurIPS 2021.

Cabeli V., Verny L., Sella N., Uguzzoni G., Verny M., Isambert H., Learning clinical networks from medical records based on information estimates in mixed-type data, PLoS Comput. Biol. 2020 | code

Li H., Cabeli V., Sella N., Isambert H., Constraint-based causal structure learning with consistent separating sets, In Advances in Neural Information Processing Systems 2019. | code.

Verny L., Sella N., Affeldt S., Singh PP., Isambert H., Learning causal networks with latent variables from multivariate information in genomic data, PLoS Comput. Biol. 2017.

Affeldt S., Isambert H., Robust Reconstruction of Causal Graphical Models based on Conditional 2-point and 3-point Information, UAI 2015 | supp.

Prerequisites

MIIC contains R and C++ sources.

Installation

From CRAN (release):

install.packages("miic")

Or from GitHub (development):

# install.packages("remotes")
remotes::install_github("miicTeam/miic_R_package")

Quick start

MIIC allows you to create a graph object from a dataset of observations of both discrete and continuous variables, potentially with missing values and taking into account unobserved latent variables. You can find this example along others by calling the documentation of the main function ?miic from R.

library(miic)

# EXAMPLE HEMATOPOIESIS
data(hematoData)
# execute MIIC (reconstruct graph)
miic_obj <- miic(
  input_data = hematoData, latent = "yes",
  n_shuffles = 10, conf_threshold = 0.001
)

# plot graph with igraph
if(require(igraph)) {
  plot(miic_obj, method="igraph")
}

Documentation

You can find the documentation pages in the "man" folder, in the auto generated PDF, or use R functions help() and ?.

Authors

License

GPL-2 | GPL-3