Open Sakurag1l opened 3 years ago
Hi @Sakurag1l, can you post the full stack trace (i.e. the full error message, which will pinpoint the line of code that threw the error)?
Thank you. I've solved that problem. But I have a question, if my hypothetical_contribs is the same as contrib_scores, does it matter?
You will still get motifs, but you should expect they will be more split up. Say a motif can have either an A or a T at a particular position; without hypothetical scores, the version with A may be reported as a separate motif from the version with T. The hypothetical scores are meant to help tf-modisco recognize when the underlying motif is the same even though the actual kmer sequence may be different, because the hypothetical scores act as an "autocomplete" of the sequence (in that they reveal what type of pattern the network was detecting at a particular location).
Thanks! Excuse me, are there other ways to generate hypothesis scores?
There are. What is the method you are currently using to generate your contribution scores?
Saliency , DeepLIFT and Saturation Mutagenesis
task_to_hyp_contrib_scores = hypothetical_contribs_many_refs_func( task_idx=1, input_data_sequences=seqs, num_refs_per_seq=10, batch_size=50, progress_update=4000, )
Does this seqs one-hot encoding only support N*4
when I use Deeplift to generate hypothetical scores.
When I was training the model, my one-hot seqs shape is4*N*1
.
So the basic idea of hypothetical importance scores is that they are an estimate of what the contribution score would be if a different base were present in the sequence at a given position.
If "saliency" = "gradient * input", then the gradient on all the possible bases (i.e. not just the bases that are present in the actual sequence) would be your hypothetical scores. I assume you compute your saliency score right now by calculating the gradients on the input and then masking by the actual one-hot encoded sequence; just don't do the masking step and you have "hypothetical" scores. These scores literally give you the model's sensitivity to the other bases that could be present in the sequence.
If using DeepLIFT, then there are a few ways to generate hypothetical scores:
If you used the original deeplift repository, you can get hypothetical scores as demonstrated in: https://github.com/kundajelab/tfmodisco/blob/master/examples/simulated_TAL_GATA_deeplearning/Generate%20Importance%20Scores.ipynb
To your question, yes, I assumed "N x channels". Can you add some permute/reshape layers to the beginning of your model to get it to take inputs of the "N x 4" shape? If you do that, then you can use the following DeepSHAP notebook to get hypothetical scores (you won't be able to use the deeplift implementation in the deeplift repository as I haven't implemented permute and reshape layers in it, but the DeepSHAP implementation of deeplift will work): https://colab.research.google.com/github/AvantiShri/shap/blob/5fdad0651176cdbf1acd6c697604a71529695211/notebooks/deep_explainer/Tensorflow%20DeepExplainer%20Genomics%20Example%20With%20Hypothetical%20Importance%20Scores.ipynb
If using Saturation Mutagenesis: assuming you mutate each base to all 3 other possible bases, and you record the model output for all the bases, then it should be straightforward to compute what the mutagenesis score would have been if the other possible bases were present. You should compute your saturation mutagenesis scores in a way that retains sign information. What I like to do is to compute the value of the output logit over all 4 bases at each position (for one of the 4 bases this will be the same as the original model output) , and then subtract the mean logit over all 4 from each of the 4 values. Bases that increase the output relative to the mean will get a positive score, and bases that decrease the output will get a negative score.
Does that make sense?
OK! Thanks!! I'm going to try to use DeepLIFT.
Hi Avanti, When I modify part of the example code and then have an error to be reported here. I don't know why. Could you help me out?
IndexError: cannot do a non-empty take from an empty axes.