EPCOT

EPCOT (comprehensively predicting EPigenome, Chromatin Organization and Transcription) is a comprehensive model to jointly predict epigenomic features, gene expression, high-resolution chromatin contact maps, and enhancer activities from DNA sequence and cell-type specific chromatin accessibility data.

We have developed resources to assist users in predicting other genomic modalities from ATAC-seq. These include a Google Colab notebook

and a webpage https://liu-bioinfo-lab.github.io/EPCOT_APP.github.io/.

Dependencies

einops (0.3.2)
kipoiseq (0.5.2)
numpy (1.19.5)
torch (1.10.1)
scipy (1.7.3)
scikit-learn (1.0.2)

You can use conda and pip to install the required packages

conda create -n epcot python==3.9
conda activate epcot
pip install -r requirements.txt

Usage

Prepare inputs to EPCOT

Please go to the directory Input/ for how to generate the inputs to EPCOT (one-hot repsentations of DNA sequences and normalized DNase-seq). All the human data used in EPCOT are in reference genome hg38 and the data processing codes are also for hg38 version.

Download the pre-training model and downstream models

You can download EPCOT models trained on DNA sequence and DNase-seq or ATAC-seq from Google Drive or

For the trained downstream models and how to train downstream models from scratch, you can go to each correspoding directory GEP/, COP/, and EAP/.

Tutorial

We prepare a Google Colab Notebook EPCOT_usage.ipynb

to introduce how to use EPCOT to predict multiple modalities.

Other materials

We prepare a GitHub page to share our TF sequence binding patterns along with Tomtom motif comparison results, and we also summarize the results in an EXCEL file motif_comparison_summary.xls.

liu-bioinfo-lab / EPCOT