Fix sequence logo parallelization bug

zavolanlab / bindz-rbp

RBP module for bindz, a bioinformatics tool to detect regulators' binding sites on RNA sequences.

https://github.com/zavolanlab/bindz-rbp

Apache License 2.0

6 stars 1 forks source link

Fix sequence logo parallelization bug #31

Closed krish8484 closed 4 years ago

krish8484 commented 4 years ago

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. Remove the expand function to avoid overwriting of sequence logos Fixes #30

Type of change

Please delete options that are not relevant.

[x] Bug fix (non-breaking change which fixes an issue)

Checklist:

[x] My code follows the style guidelines of this project
[x] I have performed a self-review of my own code
[x] I have commented my code, particularly in hard-to-understand areas
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have not reduced the existing code coverage

AngryMaciek commented 4 years ago

The interface of the script has to be corrected as well: we do not need input_files which are nargs="+", you do not need to sort it, nor iterate over it - it will be just one path.

AngryMaciek commented 4 years ago

Now when I look into your code I do not see how this would work... When do you call gather_motifs_names(config["pwm_directory"]) ?

The way I see it now - after the small modification - is that you will catch the whole directory after expansion - this is not what we want, please look at the description of #30

Also - I am not sure the DAG corresponds to the new Snakefile, after each such modification you have to update the graphs (and check them manually, if the data analysis flow is what we want).

krish8484 commented 4 years ago

I have added sequence logos as input to rule plot_heatmap_of_MotEvo_results in the snakemake pipleine.

Now when I look into your code I do not see how this would work... When do you call gather_motifs_names(config["pwm_directory"]) ?

Earlier it was getting called from rule all and now from rule plot_heatmap_of_MotEvo_results.

The way I see it now - after the small modification - is that you will catch the whole directory after expansion - this is not what we want, please look at the description of #30

No, the script is taking one file at a time and not the directory, you may confirm it with the terminal after running the pipeline.

I have updated the dag and rulegraph.