Closed krish8484 closed 4 years ago
Are the sequence logos as expected? They are in output/sequence_logos
.
Are the sequence logos as expected? They are in
output/sequence_logos
.
Dear Krish,
There is no directory called output
anywhere I see, especially not uner the root of the repository... However, I have found some sample plots for HNRNPF undet unit test directory for this script so I will refer to these.
It's OK but I think of 3 changes:
T
to U
since it is a RNA sequence, not DNA. This is important.Hi Maciek,
The directory output/sequence_logos
will be in tests/integration
after you run the snakefile, though the png
files in unit tests depict a similar output.
T
are you referring in the first point. If it is the one in motif_HNRNPF_824.png
, then that T
comes from the input file. See motif_HNRNPF_824
in the same folder, the value of T
for 1st position is 97.115
.probability
specified in the probability matrix in the input files (in unit tests folder). Hence the range 0 to 1.U
.pyplot
has these options, just google: https://stackoverflow.com/questions/9750699/how-to-display-only-a-left-and-bottom-box-border-in-matplotlib I see that use are using plt
to show so this solution should wotrk.If logomaker
does not support plotting information content out-of-the-box, could you please calculate the contents youself? According to these specifications: https://en.wikipedia.org/wiki/Position_weight_matrix#Information_content
We do not need to deal with background probabilities, the base of log
is 2
.
As I was goung through the Files changed section here on GitHub and reviewing the code I noticed that under the dedicated directory for unit-testing this script you have also added the png
sample output files to the repository (ex: tests/unit/plot_sequence_logos/motif_HNRNPF_820.png
). Could you please remind me - do we need them in the repository? It just flew through my head that maybe it is not necessary to include them, since every unit test will generate them once again, right?
If that is correct - please remove the sample output files from unit testing of this script as well as previous unit tests: combine_results
, Plot-heatmap-for-motifs
,
I have fixed the indentation in the Snakefile at the places you mentioned, removed the top and the right axes from the graph, changed the T
to U
in the input file in the unit tests, and changed the script to display information
instead of probability
.
Deleted the superfluous output files from unit tests and updated the md5sums of the output files.
Before I merge it please make sure that you have complied with all the items from the Checklist: in the initial post of the PR.
@krish8484 I have cleaned the code a little, indents (🙃), removed unnecessary expanions and importantly added two more sections to all your rules (which I forgot about earlier):
benchmark
- it allows snakemake to report the execution time per rule into a textfilelog
- to capture stdout
and stderr
streams of each script's execution into another two text files; very useful for debugging.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. This PR aims to plot the sequence logos using python's
logomaker
which may be used along with the heatmaps in the future. Fixes #23Type of change
Please delete options that are not relevant.
Checklist: