Analyses for State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing.
Figure 1. Characterization of multi-omics literature based on a systematic screen of PubMed indexed articles (up to July 2020).
The comprehensive search terms (see the online repository for details) were collapsed into four categories; integrated omics () includes integromics and integrative omics, multi-view (**) includes multi-view|block|source|modal omics, other terms (**\) include pan-, trans-, poly-, cross-omics.
The subpanels present:
PubMed database was searched for articles pertaining to multi-omics on 25th July 2020, using fourteen terms (multi|pan|trans|poly|cross-omics, multi-table|source|view|modal|block omics, integrative omics, integrated omics and integromics) including plural/singular and hyphenated/unhyphenated variants combinations. The search was automated via Entrez E-utilities API and restricted to Text Words (to avoid matching articles based on the affiliation of authors to companies such as Panomics, Inc. or Integromics S.L.); the full text and additional metadata were retrieved from the PubMed Central (PMC) database for the open access subset of articles. The feature extraction was performed via n-gram matching against ClinVar (diseases & clinical findings) and NCBI Taxonomy (species) databases, while omics references annotation was based on regular expressions capturing phrases with suffix -ome or -omic (accounting for multi-omic phrases and plural variants). All matches were manually filtered down to exclude false or irrelevant matches and to merge plural forms. The article type was collated from five sources:
Figure 2. A flow diagram of the semi-automated multi-omics literature screening effort (up to July 2020).
Figure 3. Overview of the notebooks in this code repository. Click on the plot to display an interactive version, from where you can open respective notebooks by clicking on the analysis nodes.
This analysis was contributed to our introductory review of multi-omics field, now published in Frontiers in Genetics (open access):
Krassowski M, Das V, Sahu SK and Misra BB (2020) State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Front. Genet. 11:610798. doi: 10.3389/fgene.2020.610798
Prerequisites:
Install the minimal requirements for reproduction and download required data:
pip install -r setup/requirements.txt
Rscript helpers/restore.R
cd data
./download.sh
Install additional requirements for development and testing:
pip install -r setup/requirements-dev.txt
Execute tests with:
python3 -m pytest
Freeze (snapshot) R requirements with:
Rscript helpers/freeze.R
Create the repository overview graph:
pip install nbpipeline
PYTHONPATH=$(pwd):$PYTHONPATH nbpipeline --dry_run -s -O figures/repository.svg --display_graph_with none