Open mriken opened 3 years ago
Hi @mriken, I'm not sure there is a way to do that. I guess what would be possible is to get the peak to gene mapping that is used to create the network, which will show which peak is associated with which gene, and what the weight of that association is. Would that be what you're interested in?
Hi @simonvh,
thanks for your reply. That would already be terrific, yes.
cheers
We may add this to ANANSE in a later release. Meanwhile, this would be the Python code to do this. Just update line 5 to your binding.h5
fiel location and save this as peaks2enhancer.py
and run it with python peaks2enhancer.py
(in your ananse environment).
import pyranges as pr
import pandas as pd
from ananse.network import Network
fname = "/path/to/binding.h5"
with pd.HDFStore(fname) as hdf:
peaks = hdf.get("_index")
enhancer_pr = pr.PyRanges(
peaks
.index.to_series()
.str.split(r"[:-]", expand=True)
.rename(columns={0: "Chromosome", 1: "Start", 2: "End"})
)
n = Network(genome="hg38") # If genome is not hg38 or mm10 you need to specify gene_bed as well!
# Link enhancers to genes on basis of distance to annotated TSS
gene_df = n.enhancer2gene(
enhancer_pr,
)
gene_df = gene_df.dropna()
gene_df.to_csv("enhancer2gene.txt", sep="\t")
Wonderful, very appreciated!!
However, after running ananse binding I do not see the binding.h5 file in the output directory at all. I only have the files binding.tsv and factor_activity.tsv, other than the atac/h3k27ac tsv files.
Is this in the development version of ananse, perhaps? Or does it mean that the prediction is incomplete? I don't see any error messages printed on screen when I run it.
Ah, sorry! This is indeed in the latest version, which was just released yesterday (0.3.0). This version is faster for ananse network
and uses less memory.
Cool, I'll upgrade then and try. Cheers
Hi, I find ANANSE rather useful in inferring regulatory interactions. I would like to use it to connect the peaks to the target genes directly. Is there a way to do that? Can I somehow extract the peaks from binding.tsv that are contributing to the regulation inferred in network.txt? Or, would it be possible to modify the output of the ananse network command to also return the peak region used to infer the TF-target gene link?
thanks