liusy-jz / MODAS

MODAS: Multi-Omics Data Association Study toolkit
GNU General Public License v3.0
18 stars 5 forks source link

MODAS

Introduction

MODAS (Multi-Omics Data Association Study toolkit) is an efficient software for high-dimensional omics data association analysis, featuring five main characteristics.

Note: Sample data for MODAS2 can be downloaded via DOI https://zenodo.org/doi/10.5281/zenodo.11951520.

MODAS analytical pipeline

Downloading example data

MODAS_data containing sample data for MODAS and omics data used in the article uploaded by Git extension Git Large File Storage (LFS), first download Git LFS from https://git-lfs.github.com/, and place the git-lfs binary on your system’s executable $PATH or equivalent, then set up Git LFS for your user account by running:

git lfs install

next download MODAS_data by running:

git clone https://github.com/liusy-jz/MODAS_data.git

When the download is complete, first check the integrity of the downloaded data, MODAS_data contains five folders, namely agronomic_traits, genotype, metabolome, transcriptome and example_data, also contains a gene annotaion file for maize. The example folder contains sample data for MODAS, while other folders contain the omics data used in the article.

Then, enter the MODAS_data directory,

cd MODAS_data

Generate pseudo-genotype files

MODAS genoidx -g example_data/example_geno -genome_cluster -o example_geno

Pseudo-genotype files generated by genoidx subcommand will be saved as example_geno.genome_cluster.csv.

Prescreen candidate genomic regions for omics data

The prescreen subcommand uses genome-wide genotype files to calculate the kinship matrix, first extract genotype files by:

tar -xvf genotype/chr_HAMP_genotype.tar.gz

Then, the pseudo-genotype file example_geno.genome_cluster.csv generated by genoidx and the example_phe.csv file under the example_data folder are used for prescreen analysis,

MODAS prescreen -g ./chr_HAMP -genome_cluster example_geno.genome_cluster.csv -phe example_data/example.phe.csv -o example

prescreen subcommand generates two files including example.sig_omics_phe.csv containing phenotype data and example.phe_sig_qtl.csv containing candidate genomic regions of phenotype.

Perform regional association analysis to identify QTLs

The prescreen subcommand outputs are used for regional association analysis,

MODAS regiongwas -g ./chr_HAMP -phe example.sig_omics_phe.csv -phe_sig_qtl example.phe_sig_qtl.csv -o example

regiongwas subcommand generates two QTL files including example.region_gwas_qtl_res.csv containing reliable QTL results and example.region_gwas_bad_qtl_res.csv containing unreliable QTL results.

Perform Mendelian randomization analysis

MODAS mr -g ./chr_HAMP -exposure ./example_data/example.exp.csv -outcome agronomic_traits/blup_traits_final.new.csv -qtl example_data/example_qtl_res.csv -mlm -o example

The results of Mendelian randomization analysis are saved as example.MR.csv.

MR-based network analysis

MR-based network analysis is carried out by the parameter -net of mr subcommand. It uses transcriptome data for subnetwork modules analysis,

MODAS mr -g ./chr_HAMP -exposure ./example_data/network_example.exp.csv -outcome ./example_data/network_example.exp.csv -qtl example_data/network_example_qtl.csv -mlm -net -o network_example

Network analysis generated four files, including network_example.MR.csv containing gene pairs with MR effect, network_example.edgelist containing gene pairs with weight, network_example.cluster_one.result.csv containing all identified subnetwork modules, network_example.sig.cluster_one.result.csv containing significant subnetwork modules.

co-associated gene analysis

Co-associated genes analysis is not a modas function. It is implemented by script co-associated.py. The analysis command line is as follows:

python3 example_data/co-associated.py example_data/co_associated.test.pvalue.csv co-associated_test

Then, a file containing co-associated gene labels and a heatmap showing relationship between co-associated genes are saved as co-associated_test.cluster.csv and co-associated_test.cluster.heatmap.pdf.

Document

detail in https://modas-bio.github.io/