Closed AngryMaciek closed 4 years ago
Are we looking for something like this? https://bioconductor.org/packages/release/bioc/vignettes/motifStack/inst/doc/motifStack_HTML.html#motifpiles (not sure about the tree type structure on the left of that graph though) Or should we plot the heatmap(using matplotlib) and the sequence logos separately and then represent it together?
The heatmap should be the main part of the plot, most visible, maybe even with binding probabilities annotated inside the squares. Columns should be annotated with the sequence letters and rows with the motif names (and, if possible, sequence logos). We do not need motif clustering at all, so I would not go with your 1st suggestion. It would be best if you would find out which of the solutions out there allow you to plot a plain heatmap and there is a possibility to annotate the rows with additional sub-images of sequence logos.
Take a look at the section "Complex annotation" here: https://www.datanovia.com/en/lessons/heatmap-in-r-static-and-interactive-visualization/
I see that R provides some mechanisms to add other graphical elements as additional row annotations (alongisde row name). I thing motifStack library also provides some way to combine the sequence logos with other plots, take a look at the integration to ggplot section.
I do not know if it is possible in matplotlib
...
Thanks for the clarification.
I have found another awesome R library ggtext that might help us to complete this job.
I will be going through the basics of R, so that I could understand the source code and the documentation of ggtext
and ggplot2
, and then maybe I can write the code for the required graph.
Good idea, looks like what we would like to achieve is definitely possible, we just have to discover: how.
As a template for an R
script you might use mine:
https://github.com/AngryMaciek/angry-textfile-templates/blob/master/templates/template.r
Is your feature request related to a problem? Please describe. At the end of the pipeline it is always a great idea to visualise the results. Gigabytes of text data are OK for us to work with but at the very last step, for presentation, it is really nice to show off a cool, polished plot. We should summarise our results (binding sites) on a pretty figure.
Describe the solution you'd like At the end of the pipeline (i.e. after #15, adress #15 first) we should add one more step that will create a plot based on the combined results in the TSV format. We have to start somewhere, so let us start with the heatmaps with annotated binding posterior probabilities we discussed. It would be also very fancy to add miniatures of the PWMs on that plot (so that the end user looks at it and immediately sees all the information). My initial idea is that we have a heatmap, inside the squares we mark the probabilities with color (white=0, red=1) as well as the number; we annotate the rows with the sequence logos; we annotate the columns with the subsequent nucleotides (A,C,G,T) from the user input sequence.
Additional context Consider the two PWM plotting libraries I found previously:
https://logomaker.readthedocs.io/en/latest/ https://bioconductor.org/packages/release/bioc/vignettes/motifStack/inst/doc/motifStack_HTML.html