Open socialtree-yt opened 11 months ago
I think that is not correct. ANANSE works by combing lots of different values. To do this, each data types is scaled from 0-1, then averaged between data types. If you remove part of your expression data set, the remaining genes will still be scaled from 0-1, so some are now some genes are considered super important and others are considered unimportant, while reality can be totally different.
I think you should use the full dataset (for each data type), and then filter your results afterward. Furthermore, in ANANSE influence, you can whitelist your genes of interest with the -w
parameter. This will make sure they are used to determine the differential network and top influential TF list.
Hope this helps!
OK. I understand. Thank you for your help!
Hi, I also want to ask if I can use part of regions to construct networks because I find "ananse network -r and -f parameters" can extract part of regions to do network.
The same rules apply. It's fine if, for example, you wish to filter for a specific chromosome (a superset that contains regions of interest and background regions) or you wish to remove outlier regions. But if your regions only contain regions of interest it will probably skew your results.
How can I choose background regions if I used narrowpeak as input regions? Apart from my regions of interest, how much and which regions should be taked with background regions? Thank you for your help!
You "choose" background regions by not removing them from your narrowPeaks :) In general: more is better.
If you want to shrink your input data down I guess you should keep 5 random peaks for each peak of interest. To predict motif activity, ANANSE randomly samples 3*50.000 regions. So if you shrink your input data down to 150.000 regions you should still be good. If you have less than 150.000 regions its fine, but maybe don't remove any.
But if I use 5 times the random peaks that of interesting peaks, how can I identify the TFs targeting in peaks of interest rather than random background regions? And how can I know which TF target downstream genes through peaks of interest rather than random background regions?
One other thing is, I have focused on some TFs and target genes but how can I specify which enhancers mediate these process. I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.
So can I select several specific and representative TF-enhancer-genes as samples from ananse results? It seems to be difficult. Thank you for your help!
One other thing is, I have focused on some TFs and target genes but how can I specify which enhancers mediate these process. I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.
Happy new year! :)
how can I identify the TFs targeting in peaks of interest rather than random background regions?
You can use all peaks and all TFs for this. Run ANANSE binding, then use ananse view
to see which TFs are most active in which regions. You can use filters in ANANSE view.
how can I know which TF target downstream genes through peaks of interest rather than random background regions?
Use all peaks in ANANSE binding. Then run ananse network --tfs tfs_of_interest.txt --regions regions_of_interest.bed
. This way, the TF activity scores have been made with a proper background!
Note that ANANSE network outputs TF-gene links! The full output looks like this:
tf_target prob tf_expression target_expression weighted_binding activity
LOC100127624—42Sp43 0.33331668 0.0532994923857868 0.8891054506351234 0.115612045 0.27524972
LOC100127624—42Sp50 0.3161416 0.0532994923857868 0.936017205457356 0.0 0.27524972
Headers:
I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.
This is correct. ANANSE binding links TFs to enhancers. ANANSE network links enhancers to target genes, and combined these values to return a TF-target gene link. You could try to puzzle out which enhancers were involved, but I can't vouch for the reliability of the results. ANANSE's core strength is the influence output: identifying key transcription factor differences between two conditions.
Thank you for your help! So if I use "ananse network --tfs tfs_of_interest.txt --regions regions_of_interest.bed" with 1 TF and 1 region.bed, I can depict exact TF-enhancer-genes triple in the total network. The triple is as same as it in the total network using "ananse network --tfs all_tfs.txt --regions all_regions.bed". Am I right?
Hello, thank you for your convenient tools. I want to only focus on TFs regulating a exact set of genes and construct regulatory network on it. How can I do it? Is it right to use gene sets tpms expression data in "ananse network -e parameter" and not change inputs in "ananse binding"?