loosolab / TOBIAS

Transcription factor Occupancy prediction By Investigation of ATAC-seq Signal
MIT License
188 stars 40 forks source link

TOBIAS and RNAseq data integration #217

Closed deep-buddingcoder closed 10 months ago

deep-buddingcoder commented 1 year ago

Hi Mette,

This is only a thought/suggestion (not an issue) pertaining to TOBIAS in general, although currently I only run TOBIAS_snakemake.

I am curious to know if there is any plan by Looso lab to integrate RNA seq data into TOBIAS data analysis. The reason why I ask this is because, as described under:

My Sample type: condition A condition B

My TF of interest: TFx

Note: TFx is only expressed in condition B.

My observation is: TOBIAS_snakemake detected a global footprint in condition A for TFx (although TFx is not expressed in condition A) (I did not expect this). TOBIAS_snakemake also detected a larger (greater magnitude) global footprint in condition B for TFx. (I expected this.)

I have planned to perform some additional analysis to check if chromatin regions from condition A with TFx motif are also bound by any other TF which may be expressed in condition A. This will explain why chromatin regions with TFx motif are open and show presence of footprint even if TFx is not expressed in condition A.

My understanding is, by integrating RNAseq data, the "overview/all_condition_A_bound.bed" output can be further filtered/screened to generate only those footprint for which TF is known to be expressed in condition A.

I will appreciate your thought about this.

thanks Deep

msbentsen commented 1 year ago

Hi @deep-buddingcoder,

This is actually an issue we deal with a lot! We don't have any plans to integrate this into TOBIAS directly, but as you have seen, it is possible to do some post-processing of the TFBS in order to filter out non-expressing TFs. In your case, detecting a footprint for a non-expressed factor can be due to both:

So I would also recommend to try to merge the RNA-seq into the results after the TOBIAS snakemake run, in order to create a filtered version of the results.

BR Mette

deep-buddingcoder commented 1 year ago

Hi Mette, Thanks for your reply. Can you please elaborate more on these below mentioned points:

  1. How can I perform "it is possible to do some post-processing of the TFBS in order to filter out non-expressing TFs"? If I filter out TFx motif from the motif file, then I will loose footprint in condition B as well along with condition A.

  2. How can I perform "try to merge the RNA-seq into the results after the TOBIAS snakemake run, in order to create a filtered version of the results."?

Your insight will be valuable for me to design strategy for further processing of data.

Thanks Deep

sufyazi commented 1 year ago

Hi @deep-buddingcoder,

Not Mette but I was following this discussion because I am interested in this topic as well, but I believe what he meant by 1. is 'post-processing', so AFTER you have run TOBIAS on the motif set you are interested in.

For 2., I guess it depends on what kind of questions you want to ask. You mentioned before that

My understanding is, by integrating RNAseq data, the "overview/all_condition_A_bound.bed" output can be further filtered/screened to generate only those footprint for which TF is known to be expressed in condition A.

If your research question is centred around TFx, then if you do this, you would lose the story with TFx because once you filter the output file only for footprints with motifs recognized by TFs only expressed in condition A, assuming that the hypothesis put forth by Mette above:

Binding of other TFs with similar motifs to the TF, as you also mention.

you would lose footprints associated with these TFx motifs, no? Presumably these 'binding co-factors' may not have been characterized so there is no record of them matching TFx motifs, so if you run the analysis already having filtered TFx-associated motifs out of the motif file, not only would you obviously not be seeing it in the condition B output footprints, but you would not realize that TFx-associated motif also coincides with a binding footprint of some kind of protein in condition A.

That's my 2 cents.

deep-buddingcoder commented 1 year ago

Hi @sufyazi Thanks for your insights.

My original thoughts got lost in my not so comprehensive/crisp text message.

Anyway, this is what I would like to do (although I am not sure about the method to apply to achieve my objective.)

I do not want to change the footprint result per se (for example TOBIAS_snakemake output: overview/all_aggregate_comparison_bound.pdf) In this result, the footprint represents true sequencing data i.e. true scenario irrespective of whether the footprint is actually caused by TFx or any other TF (except for false positive).

But, using the RNA-expression data as reference, I do want to modify the TOBIAS_snakemake output: overview/bindetect_results.xlsx; such that:

if TFx is not expressed in condition A, then is it possible to modify the:

i) condition_A_for_TFx_totalData_mean_score ii) condition_A_for_TFx_totalData_bound iii) condition_A_for_TFx_totalData_condition_B_for_TFx_totalData_change iv) condition_A_for_TFx_totalData_condition_B_for_TFx__totalData_pvalue

so that in subsequent volcano plot, the TFx gets highlighted/enriched in condition_B instead of condition_A (Currently, I have got enrichment of TFx in conidition_A instead of condition_B).

So any insight from you or Mette regarding the mathematics behind the proposed "option for editing bindetect_results.xlsx" is what I need. What kind of formula should I employ, so that BINDetect output can represent the true expression status of a TF and the value of "condition_A_for_TFx_totalData_condition_B_for_TFx_totalData_change" represents enrichment of TFx in condition_B.

Thanks Deep

msbentsen commented 1 year ago

Thank you for both your input @deep-buddingcoder and @sufyazi!

As I understand it, you want to have a way to force a TF to be upregulated in condition B, since you know it is not expressed in A:

if TFx is not expressed in condition A, then is it possible to modify the:

i) condition_A_for_TFx_totalData_mean_score ii) condition_A_for_TFx_totalData_bound iii) condition_A_for_TFx_totalData_condition_B_for_TFx_totalData_change iv) condition_A_for_TFx_totalData_condition_B_for_TFx__totalData_pvalue

I don't see a way that this is possible unfortunately. The footprints will always be influenced by several TFs if these share motifs, so by changing the "_mean_score" or "_change" columns based on one TF, you are disregarding the observations from the footprinting of all the other TFs. As footprints are calculated independently of motifs, it is difficult to remove the footprint itself - we just try to filter the assignment afterwards. If you hack it and set condition_A_mean_score = 0 because it is not expressed in condition A, this would be disregarding the possibility that other TFs (which are expressed) share this motif in condition A.

Can you clarify whether you have such a case of anti-correlated footprints/RNA? Since in your original message you write:

My TF of interest: TFx

Note: TFx is only expressed in condition B.

My observation is: TOBIAS_snakemake detected a global footprint in condition A for TFx (although TFx is not expressed in condition A) (I did not expect this). TOBIAS_snakemake also detected a larger (greater magnitude) global footprint in condition B for TFx. (I expected this.)

Which could be interpreted as an increase in footprint driven by TFx becoming expressed, but the "control" footprint in condition A being due to other TFs.

In general, it might be interesting for you to plot the comparison of conditionA_conditionB_change (from TOBIAS) with conditionA_conditionB_log2fc (from RNA). If you see very strong anti-correlation, this might also be an indicator of repressing behavior. TOBIAS assumes that TFs have activating properties and the TOBIAS footprinting score also correlates with increases in accessibility, but if a TF binds and causes chromatin compaction, this TF would end up on the wrong side of the volcano plot. I have seen a few cases of this myself, but have not found a stable way to adjust for this behavior. These TFs usually do not create visible footprints in either of the conditions though. Just FYI to keep in mind.

BR Mette