zavolanlab / bindz-rbp

RBP module for bindz, a bioinformatics tool to detect regulators' binding sites on RNA sequences.
https://github.com/zavolanlab/bindz-rbp
Apache License 2.0
6 stars 1 forks source link

Add sequence logos to heatmap #27

Closed krish8484 closed 4 years ago

krish8484 commented 4 years ago

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. Added sequence logos pngs as y axis ticks for heatmap. This sequence logos will be added only if the flag is set.

Fixes #26

Type of change

Please delete options that are not relevant.

Checklist:

krish8484 commented 4 years ago

The sample svg file is in tests/unit/Plot-heatmap-for-motifs. With the inclusion of logos to the heatmap, I have made some minor changes in sequence logos and heatmap to make the final graph prettier. They can be reversed though with few changes.

Note that the file location of sequence-logos are kept in terms of integration and not unit. So one would have to run the integration tests first to plot the logos and then run the script.

AngryMaciek commented 4 years ago

Note that the file location of sequence-logos are kept in terms of integration and not unit. So one would have to run the integration tests first to plot the logos and then run the script.

This is not a proper unit-test then; unit tests have to be encapsualted to contain all the input required. You can run the integration test once and then copy whatever you need to a dedicated unit-test directory, right?

krish8484 commented 4 years ago

Note that the file location of sequence-logos are kept in terms of integration and not unit. So one would have to run the integration tests first to plot the logos and then run the script.

This is not a proper unit-test then; unit tests have to be encapsualted to contain all the input required. You can run the integration test once and then copy whatever you need to a dedicated unit-test directory, right?

This is not what I meant. While plotting the logos, we need the png paths of the the logos, right? So I can either put the location of the images generated from the integration tests or the unit tests.We cant keep both, currently I have put the location of the integration tests.

There is another option where we can mention all the paths in the command line, this is a better option but then the code would have to depend on the user to provide multiple long paths to the images(for eg. if there are 100 motifs there have to be 100 paths in the command line addressing the sequence logos), plus the user must provide the paths for all motif, it is likely that one or too may miss out. This solution might be inconvenient to the user.

AngryMaciek commented 4 years ago

Can you provide path to the directory with all pngs? If that argument is missing - plot simple heatmap.

krish8484 commented 4 years ago

Can you provide path to the directory with all pngs? If that argument is missing - plot simple heatmap.

Yes, updated the script to do this.

AngryMaciek commented 4 years ago

There is one more thing I just noticed - the latters on X-axis should also have T->U, of course. Could you please change it in the plotting script? Also: do we need the svg output file in the unit test in this repo? As I said earlier - these will be created during the tests, right?

krish8484 commented 4 years ago

Replaced the T with U in input sequence. That output svg file is in the unit tests so that you can review it without running the script. I will delete it after the heatmap looks OK and also update the md5sums.

AngryMaciek commented 4 years ago

I think it looks good, you can finish the branch off. Have you ever tested how does it look like with 100 motifs?

krish8484 commented 4 years ago

Have you ever tested how does it look like with 100 motifs?

No, I haven't.

AngryMaciek commented 4 years ago

No, I haven't.

Could you please do it and post the screenshot of the plot here?

krish8484 commented 4 years ago

I had created a dummy data of 108 motifs and this is what I got. I think we should use pdf in place of svg, its better in terms of quality and we need to find a way or some formula to scale the size of pdf depending upon the number of motifs. ProbabilityvsSequences.pdf

AngryMaciek commented 4 years ago

Can we scale the height of the pdf the same as svg? I saw that you have just added a scaling mechanism<?> I presume it will not raise any problems if we convert to PDFs now? Also - the width is not optimal either... Could you add scaling to the length of the sequence? Of course some margin will be required for the colour bar and the row annotations...

krish8484 commented 4 years ago

Can we scale the height of the pdf the same as svg? I saw that you have just added a scaling mechanism<?> I presume it will not raise any problems if we convert to PDFs now?

Yes, we can change the height and width but that formula which I used is not accurate, we need to think of something better. We can use if-else(changing the size of pdf depending on number of motifs), but since we dont have an upper limit to the number of motifs, even if-else is not very efficient. Same for input-sequence.

AngryMaciek commented 4 years ago

In that case I would propose that you make a separate issue for this - these are aesthetics then, not a very hight priority. I just wanted to check that nothing breaks for bigger datasets.
We have more pressing matters to focus on (in terms of bringing this tool live).

krish8484 commented 4 years ago

I had to make final changes regarding changing svg to pdf and setting LIMIT to FALSE.

AngryMaciek commented 4 years ago

OK, you will have to clone dev again, create the feature branch once again, and file another PR?

krish8484 commented 4 years ago

OK