Computation of hypothetical contributions

Karollus commented 3 years ago

I would like to ask about the computation of the "hypothetical" contribution scores (which tf modisco requires).

I understand from the paper and the code (particularly seqmodel._contrib_deeplift_fn), that the contribution is computed using deeplift as implemented in the DeepExplain package, using a zero sequence as baseline.

Then, judging from the code in the class ContribFile in contrib.py, specifically functions get_contrib and get_hyp_contrib, the actual and hypothetical contributions are computed as follows: 1) actual contribution = the deeplift result * the one hot input (so we zero everything except the signal for the base that was actually present) 2) hypothetical contribution = the deeplift result (averaged over strands?) Is this correct?

As I understand, this works because this modification was made to DeepExplain: https://github.com/kundajelab/DeepExplain/commit/2348bc8d784d6e5d2e9afa9b3eac6eea5fc69829 (because, by default, deepexplain already does signal * (input - baseline), making a computation of the hypothetical impossible when a zero baseline is used). Is this correct?

I am sorry if all of this is obvious, but I can't find this information in the paper, and I would like to do a similar analysis using Deeplift (from DeepExplain) + Modisco, so I want to make sure my understanding is correct.

Best, Alex

Avsecz commented 3 years ago

Your understanding is correct. TF-MoDISco paper provides nice explaination/intuition behind hypothetical contribution scores. @AvantiShri do you have any other good pointers on hypothetical contribution scores for DeepLIFT?

AvantiShri commented 3 years ago

You could also reference my response in this github issue: https://github.com/kundajelab/tfmodisco/issues/5

Karollus commented 3 years ago

Dear @Avsecz and @AvantiShri

Thank you for the clarification and the pointer! I now understand the hypothetical a lot better.

kundajelab / bpnet

Computation of hypothetical contributions #19