vanheeringen-lab / ANANSE

Prediction of key transcription factors in cell fate determination using enhancer networks. See full ANANSE documentation for detailed installation instructions and usage examples.
http://anansepy.readthedocs.io
MIT License
77 stars 16 forks source link

How to construct a network only focusing on my interesting genes? #210

Open socialtree-yt opened 10 months ago

socialtree-yt commented 10 months ago

Hello, thank you for your convenient tools. I want to only focus on TFs regulating a exact set of genes and construct regulatory network on it. How can I do it? Is it right to use gene sets tpms expression data in "ananse network -e parameter" and not change inputs in "ananse binding"?

siebrenf commented 10 months ago

I think that is not correct. ANANSE works by combing lots of different values. To do this, each data types is scaled from 0-1, then averaged between data types. If you remove part of your expression data set, the remaining genes will still be scaled from 0-1, so some are now some genes are considered super important and others are considered unimportant, while reality can be totally different.

I think you should use the full dataset (for each data type), and then filter your results afterward. Furthermore, in ANANSE influence, you can whitelist your genes of interest with the -w parameter. This will make sure they are used to determine the differential network and top influential TF list.

Hope this helps!

socialtree-yt commented 10 months ago

OK. I understand. Thank you for your help!

socialtree-yt commented 10 months ago

Hi, I also want to ask if I can use part of regions to construct networks because I find "ananse network -r and -f parameters" can extract part of regions to do network.

siebrenf commented 10 months ago

The same rules apply. It's fine if, for example, you wish to filter for a specific chromosome (a superset that contains regions of interest and background regions) or you wish to remove outlier regions. But if your regions only contain regions of interest it will probably skew your results.

socialtree-yt commented 10 months ago

How can I choose background regions if I used narrowpeak as input regions? Apart from my regions of interest, how much and which regions should be taked with background regions? Thank you for your help!

siebrenf commented 9 months ago

You "choose" background regions by not removing them from your narrowPeaks :) In general: more is better.

If you want to shrink your input data down I guess you should keep 5 random peaks for each peak of interest. To predict motif activity, ANANSE randomly samples 3*50.000 regions. So if you shrink your input data down to 150.000 regions you should still be good. If you have less than 150.000 regions its fine, but maybe don't remove any.

socialtree-yt commented 9 months ago

But if I use 5 times the random peaks that of interesting peaks, how can I identify the TFs targeting in peaks of interest rather than random background regions? And how can I know which TF target downstream genes through peaks of interest rather than random background regions?

socialtree-yt commented 9 months ago

One other thing is, I have focused on some TFs and target genes but how can I specify which enhancers mediate these process. I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.

socialtree-yt commented 9 months ago

So can I select several specific and representative TF-enhancer-genes as samples from ananse results? It seems to be difficult. Thank you for your help!

One other thing is, I have focused on some TFs and target genes but how can I specify which enhancers mediate these process. I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.

siebrenf commented 9 months ago

Happy new year! :)

how can I identify the TFs targeting in peaks of interest rather than random background regions?

You can use all peaks and all TFs for this. Run ANANSE binding, then use ananse view to see which TFs are most active in which regions. You can use filters in ANANSE view.

how can I know which TF target downstream genes through peaks of interest rather than random background regions?

Use all peaks in ANANSE binding. Then run ananse network --tfs tfs_of_interest.txt --regions regions_of_interest.bed. This way, the TF activity scores have been made with a proper background!

Note that ANANSE network outputs TF-gene links! The full output looks like this:

tf_target            prob        tf_expression       target_expression   weighted_binding  activity
LOC100127624—42Sp43  0.33331668  0.0532994923857868  0.8891054506351234  0.115612045       0.27524972
LOC100127624—42Sp50  0.3161416   0.0532994923857868  0.936017205457356   0.0               0.27524972

Headers:

I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.

This is correct. ANANSE binding links TFs to enhancers. ANANSE network links enhancers to target genes, and combined these values to return a TF-target gene link. You could try to puzzle out which enhancers were involved, but I can't vouch for the reliability of the results. ANANSE's core strength is the influence output: identifying key transcription factor differences between two conditions.

socialtree-yt commented 9 months ago

Thank you for your help! So if I use "ananse network --tfs tfs_of_interest.txt --regions regions_of_interest.bed" with 1 TF and 1 region.bed, I can depict exact TF-enhancer-genes triple in the total network. The triple is as same as it in the total network using "ananse network --tfs all_tfs.txt --regions all_regions.bed". Am I right?