krassowski / multi-omics-state-of-the-field

Analyses for "State of the field in multi-omics research: from computational needs to data mining and sharing"
https://doi.org/10.3389/fgene.2020.610798
MIT License
24 stars 13 forks source link
binder-ready integrative-omics multi-omics omics omics-data-integration papers-with-code

Multi-omics: state of the field

Build Status Binder

Analyses for State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing.

Overview

Overview figure - click to go to the PDF version

Figure 1. Characterization of multi-omics literature based on a systematic screen of PubMed indexed articles (up to July 2020).

The comprehensive search terms (see the online repository for details) were collapsed into four categories; integrated omics () includes integromics and integrative omics, multi-view (**) includes multi-view|block|source|modal omics, other terms (**\) include pan-, trans-, poly-, cross-omics.

The subpanels present:

Methods

PubMed database was searched for articles pertaining to multi-omics on 25th July 2020, using fourteen terms (multi|pan|trans|poly|cross-omics, multi-table|source|view|modal|block omics, integrative omics, integrated omics and integromics) including plural/singular and hyphenated/unhyphenated variants combinations. The search was automated via Entrez E-utilities API and restricted to Text Words (to avoid matching articles based on the affiliation of authors to companies such as Panomics, Inc. or Integromics S.L.); the full text and additional metadata were retrieved from the PubMed Central (PMC) database for the open access subset of articles. The feature extraction was performed via n-gram matching against ClinVar (diseases & clinical findings) and NCBI Taxonomy (species) databases, while omics references annotation was based on regular expressions capturing phrases with suffix -ome or -omic (accounting for multi-omic phrases and plural variants). All matches were manually filtered down to exclude false or irrelevant matches and to merge plural forms. The article type was collated from five sources:

Flow diagram

Figure 2. A flow diagram of the semi-automated multi-omics literature screening effort (up to July 2020).

Code overview

Overview of the notebooks in the repository

Figure 3. Overview of the notebooks in this code repository. Click on the plot to display an interactive version, from where you can open respective notebooks by clicking on the analysis nodes.

Reference

This analysis was contributed to our introductory review of multi-omics field, now published in Frontiers in Genetics (open access):

Krassowski M, Das V, Sahu SK and Misra BB (2020) State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front. Genet. 11:610798. doi: 10.3389/fgene.2020.610798

Reproducing

Prerequisites:

Install the minimal requirements for reproduction and download required data:

pip install -r setup/requirements.txt
Rscript helpers/restore.R
cd data
./download.sh

Development and contributing

Install additional requirements for development and testing:

pip install -r setup/requirements-dev.txt

Execute tests with:

python3 -m pytest

Freeze (snapshot) R requirements with:

Rscript helpers/freeze.R

Create the repository overview graph:

pip install nbpipeline
PYTHONPATH=$(pwd):$PYTHONPATH nbpipeline --dry_run -s -O figures/repository.svg --display_graph_with none