Extend pipeline to combine results in one tsv file

zavolanlab / bindz-rbp

RBP module for bindz, a bioinformatics tool to detect regulators' binding sites on RNA sequences.

https://github.com/zavolanlab/bindz-rbp

Apache License 2.0

7 stars 1 forks source link

Extend pipeline to combine results in one tsv file #18

Closed krish8484 closed 4 years ago

krish8484 commented 4 years ago

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

All the motevo results are encapsulated in one tsv file

Fixes #15

Type of change

Please delete options that are not relevant.

[x] New feature (non-breaking change which adds functionality)

Checklist:

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have not reduced the existing code coverage

AngryMaciek commented 4 years ago

@krish8484 Apart from what is in the code comments, I have unchecked some boxes from the checklist that still need to be addressed. As you change how the script is called you will also have to add unit test for your script (the same way I have for another one). In order to generate dummy input for the unit test just run the pipeline once, copy the main directory with MotEvo results (per motif directories) and remove all of them except, let's say, arbitrarily chosen 10 (so that the test is rather small).

And then, add the unit test to the CI as well.
Don't forget to create the expected output txt and md5 files (all the same as for the other unit test).

krish8484 commented 4 years ago

Okay, i had done it the wrong way, will fix it thanks!

krish8484 commented 4 years ago

All the changed mentioned above have been implemented. Both the md5sum --check commands are returning OK on my local machine, but failing on the travis server. I am not understanding the reason.

AngryMaciek commented 4 years ago

I have made some minor comments, please remove unnecessary files first, before I git into the script itself. Also, please remove all .snakemake_timestamp files. There are internal time stamps that should not be under version control;

krish8484 commented 4 years ago

The log files were auto generated after running the pipeline, you are right they were not needed, so I deleted them now. The integration tests are passing sometime and failing other time because the order of input directories that the snakemake file receives is not the same all the time, so the order of tsv file changes according to the input. I think unit tests do a good job of checking the script. I have implemented the other suggested changes.

AngryMaciek commented 4 years ago

The integration tests are passing sometime and failing other time because the order of input directories that the snakemake file receives is not the same all the time, so the order of tsv file changes according to the input. I think unit tests do a good job of checking the script.

OK, thanks! Looks good now. But before I merge we need to take care about this randomness effect. Reproducibility is a very important aspect of data analysis and we cannot leave any non-deterministic effects like that. That is a easy fix, though - just sort the list of paths to the files with sorted() buitl-in function. Then the order will always be the same.

AngryMaciek commented 4 years ago

OK, looks very good now. ╰(▔∀▔)╯

As I was going through the files in detail I also noticed that you do not have a conda virtual environment for your rule, right? Each snakemake rule is executed in a separated, dedicated virtual environment which contains all the packages needed. They are automacitally built by snakemake during the pipeline execution, built based on the YAML files. Take a look at other rules in the workflow. Could you please add a YAML file with specifications related to your script and a proper field in the Snakefile?

And please remember to run black formatter on your python script (I am not sure if you did).

AngryMaciek commented 4 years ago

Please also run a rulegraph run test to generate new image of the workflow that will be displayed in the README (it should contain your rule). The new image file should substitute the old one in images directory.

krish8484 commented 4 years ago

Updated the image of the workflow, added env file, formatted using black and changed the path of parse.py