Closed krish8484 closed 4 years ago
@krish8484 Apart from what is in the code comments, I have unchecked some boxes from the checklist that still need to be addressed. As you change how the script is called you will also have to add unit test for your script (the same way I have for another one). In order to generate dummy input for the unit test just run the pipeline once, copy the main directory with MotEvo results (per motif directories) and remove all of them except, let's say, arbitrarily chosen 10 (so that the test is rather small).
And then, add the unit test to the CI as well.
Don't forget to create the expected output txt and md5 files (all the same as for the other unit test).
Okay, i had done it the wrong way, will fix it thanks!
All the changed mentioned above have been implemented. Both the md5sum --check
commands are returning OK
on my local machine, but failing on the travis server. I am not understanding the reason.
I have made some minor comments, please remove unnecessary files first, before I git into the script itself.
Also, please remove all .snakemake_timestamp
files. There are internal time stamps that should not be under version control;
The log files were auto generated after running the pipeline, you are right they were not needed, so I deleted them now. The integration tests are passing sometime and failing other time because the order of input directories that the snakemake file receives is not the same all the time, so the order of tsv file changes according to the input. I think unit tests do a good job of checking the script. I have implemented the other suggested changes.
The integration tests are passing sometime and failing other time because the order of input directories that the snakemake file receives is not the same all the time, so the order of tsv file changes according to the input. I think unit tests do a good job of checking the script.
OK, thanks! Looks good now. But before I merge we need to take care about this randomness effect. Reproducibility is a very important aspect of data analysis and we cannot leave any non-deterministic effects like that. That is a easy fix, though - just sort the list of paths to the files with sorted()
buitl-in function. Then the order will always be the same.
OK, looks very good now. ╰(▔∀▔)╯
As I was going through the files in detail I also noticed that you do not have a conda virtual environment for your rule, right? Each snakemake rule is executed in a separated, dedicated virtual environment which contains all the packages needed. They are automacitally built by snakemake during the pipeline execution, built based on the YAML files. Take a look at other rules in the workflow. Could you please add a YAML file with specifications related to your script and a proper field in the Snakefile?
And please remember to run black formatter on your python script (I am not sure if you did).
Please also run a rulegraph run
test to generate new image of the workflow that will be displayed in the README (it should contain your rule). The new image file should substitute the old one in images
directory.
Updated the image of the workflow, added env file, formatted using black
and changed the path of parse.py
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
All the motevo results are encapsulated in one tsv file
Fixes #15
Type of change
Please delete options that are not relevant.
Checklist: