tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

How to find annotation files of other matrix files from 4DN #9

Open Dweson opened 4 years ago

Dweson commented 4 years ago

I'm sorry for disturbing you again. I want to try this model with other interaction matrix files. But I am in trouble with finding the annotation files of them, I have spent a lot of time on it. Thank you very much.

tariks commented 4 years ago

I assume you mean an interaction list for training right? There are a couple options:

  1. Find an orthogonal experiment (like ChIA-PET, HiChIP, PLAC-Seq, Trac-loop, capture Hi-C, STORM, FiSH ...) that reports pairwise interactions in the same cell type as your matrix. These often don't exist yet for many cell lines. You can start at 4DN and ENCODE websites. If there isn't anything there you may search literature / GEO for the supplementary files in publications. Sometimes there will be raw data and you'll have to process that to get significant interactions.
  2. We recently updated our manuscript as part of the reviewing process. We show that manually selecting ~200 loops by visual inspection of the matrix is enough to train a model. You can load your matrix in juicer / HiGlass and record coordinates of obvious interactions.
  3. Finally, if there aren't any available orthogonal data in your cell type, you can train a model in another cell type at a similar sequencing depth. example: you want to predict loops in a Hi-C matrix of liver cells. You can train a model from blood cells then use that to predict in liver.

Hope that is helpful. If by annotation file you meant things like ChIP-seq tracks or other 1D annotations, these are available at the UCSC genome browser and ENCODE, but aren't used by Peakachu for training or prediction.