Adding peak edges to network -- Add feature request

vanheeringen-lab / ANANSE

Prediction of key transcription factors in cell fate determination using enhancer networks. See full ANANSE documentation for detailed installation instructions and usage examples.

http://anansepy.readthedocs.io

MIT License

77 stars 16 forks source link

Adding peak edges to network -- Add feature request #111

Open mriken opened 3 years ago

mriken commented 3 years ago

Hi, I find ANANSE rather useful in inferring regulatory interactions. I would like to use it to connect the peaks to the target genes directly. Is there a way to do that? Can I somehow extract the peaks from binding.tsv that are contributing to the regulation inferred in network.txt? Or, would it be possible to modify the output of the ananse network command to also return the peak region used to infer the TF-target gene link?

thanks

simonvh commented 3 years ago

Hi @mriken, I'm not sure there is a way to do that. I guess what would be possible is to get the peak to gene mapping that is used to create the network, which will show which peak is associated with which gene, and what the weight of that association is. Would that be what you're interested in?

mriken commented 3 years ago

Hi @simonvh,

thanks for your reply. That would already be terrific, yes.

cheers

simonvh commented 3 years ago

We may add this to ANANSE in a later release. Meanwhile, this would be the Python code to do this. Just update line 5 to your binding.h5 fiel location and save this as peaks2enhancer.py and run it with python peaks2enhancer.py (in your ananse environment).

import pyranges as pr
import pandas as pd
from ananse.network import Network

fname = "/path/to/binding.h5"

with pd.HDFStore(fname) as hdf:
    peaks = hdf.get("_index")

enhancer_pr = pr.PyRanges(
    peaks
    .index.to_series()
    .str.split(r"[:-]", expand=True)
    .rename(columns={0: "Chromosome", 1: "Start", 2: "End"})
)

n = Network(genome="hg38") # If genome is not hg38 or mm10 you need to specify gene_bed as well!

# Link enhancers to genes on basis of distance to annotated TSS
gene_df = n.enhancer2gene(
    enhancer_pr,
)
gene_df = gene_df.dropna()

gene_df.to_csv("enhancer2gene.txt", sep="\t")

mriken commented 3 years ago

Wonderful, very appreciated!!

However, after running ananse binding I do not see the binding.h5 file in the output directory at all. I only have the files binding.tsv and factor_activity.tsv, other than the atac/h3k27ac tsv files.

Is this in the development version of ananse, perhaps? Or does it mean that the prediction is incomplete? I don't see any error messages printed on screen when I run it.

simonvh commented 3 years ago

Ah, sorry! This is indeed in the latest version, which was just released yesterday (0.3.0). This version is faster for ananse network and uses less memory.

mriken commented 3 years ago

Cool, I'll upgrade then and try. Cheers