loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
180 stars 38 forks source link

Healp with CreateNetwork edges.txt interpretation #273

Open c2b2pss opened 1 month ago

c2b2pss commented 1 month ago

I ran CreateNetwork on a batch of " _bound.bed" files and got

  1. adjancent.txt
  2. edges.txt
  3. a paths and edges file for each TF for which inputted the bed file.

The edges.txt file output is attached.

  1. What are the column names?
  2. Is this the file to use for looking at the whole network?
  3. Which column is source and which is target?
  4. At the last column there is are names of TF matched to my input mapping files. However, in colmn 4 ="Sites 3" the original HOCOMOCO names are still retained.

If you can please clarify which file to use to build the network and how the connections go it would be very helpful! edges.txt

c2b2pss commented 1 month ago

Any comments?

hschult commented 1 month ago

Hi @c2b2pss,

yes, I realize that the CreateNetwork tool can be confusing. So here is a bit more detail based on the example run given here. Please also see the image in this link as it summarizes the intention of this tool very well.

The CreateNetwork tool builds a TF-TF network based on given TF binding sites. For this to work, it needs two additional pieces of information: the gene origin (the gene that creates the TF) and the target gene (TF is bound in the promoter). The gene origin is provided through the --origin parameter. This is a two-column mapping file with the TF name (left) and the origin gene (right; see motif2gene_mapping.txt). The target genes are provided through a column within the *_bound.bed files. With this, you can run TOBIAS CreateNetwork to create four file types:

1. adjacency.txt

This file contains all direct connections between a source TF and its target TFs. It can be read as "Source TF binds in the promoter of Target TF" (Supplementary Methods of the TOBIAS paper) and is recommended to be used for visualization.

Source  Targets
AR  
ARNT    LIN54, ELF2, IRF2
...

2. edges.txt

This file contains the TF binding locations used to create the network. All *_bound.bed files are combined but filtered for sites that target genes with a known TF motif. Columns named Site_x come from the .bed files and Origin_x columns come from the TF-to-Gene mapping file (--origin).

Sites_0 Sites_1 Sites_2 Sites_3 Sites_4 Sites_5 Sites_6 Sites_7 Sites_8 Sites_9 Sites_10    Sites_11    Sites_12    Sites_13    Sites_14    Origin_0    Origin_1
CHR4    83013103    83013109    ARNT    8.10161 -   CHR4    83012435    83013425    BCELL,TCELL .   .   ENSG00000189308 LIN54   25.46566    LIN54   ENSG00000189308
CHR4    139177963   139177969   ARNT    8.10161 -   CHR4    139176415   139178557   BCELL,TCELL .   .   ENSG00000109381 ELF2    27.12301    ELF2    ENSG00000109381

3. *_path_edges.txt

Similar to adjacency.txt this file contains connections between TFs however, it is limited to only one source TF. The Level column provides whether the connection between two TFs is direct or indirect (see graph theory level).

Source  Target  Level
ARNT    LIN54   1
ARNT    IRF2    1
ARNT    ELF2    1

4. *_paths.txt

This file contains all paths with the respective TF. The n_nodes column gives the number of nodes (TFs) involved in any given path.

Regulatory_path n_nodes
ARNT --> LIN54  2
ARNT --> ELF2   2
ARNT --> IRF2   2

I hope this clears things up!

Best wishes, Hendrik

github-actions[bot] commented 1 day ago

No activity for at least 30 days. Marking issue as stale. Stale issues are closed after one week.