liu-bioinfo-lab / EPCOT

17 stars 4 forks source link

EPCOT

DOI

EPCOT (comprehensively predicting EPigenome, Chromatin Organization and Transcription) is a comprehensive model to jointly predict epigenomic features, gene expression, high-resolution chromatin contact maps, and enhancer activities from DNA sequence and cell-type specific chromatin accessibility data.

We have developed resources to assist users in predicting other genomic modalities from ATAC-seq. These include a Google Colab notebook

Open In Colab

and a webpage https://liu-bioinfo-lab.github.io/EPCOT_APP.github.io/.

<img src="Data/model.png" title="" style="display: inline-block; margin: 0 auto; max-width: 300px">

Dependencies

You can use conda and pip to install the required packages

conda create -n epcot python==3.9
conda activate epcot
pip install -r requirements.txt

Usage

Prepare inputs to EPCOT

Please go to the directory Input/ for how to generate the inputs to EPCOT (one-hot repsentations of DNA sequences and normalized DNase-seq). All the human data used in EPCOT are in reference genome hg38 and the data processing codes are also for hg38 version.

Download the pre-training model and downstream models

You can download EPCOT models trained on DNA sequence and DNase-seq or ATAC-seq from Google Drive or DOI

For the trained downstream models and how to train downstream models from scratch, you can go to each correspoding directory GEP/, COP/, and EAP/.

Tutorial

We prepare a Google Colab Notebook EPCOT_usage.ipynb

Open In Colab

to introduce how to use EPCOT to predict multiple modalities.

Other materials

We prepare a GitHub page to share our TF sequence binding patterns along with Tomtom motif comparison results, and we also summarize the results in an EXCEL file motif_comparison_summary.xls.