Closed xin-huang closed 6 months ago
In fact, intronets_simulate_training_set_fixed_sample_size.smk
and intronets_simulate_training_set_many_samples.smk
are rather for my testing purposes.
For the next steps, the workflow of intronets_simulate_training_set_many_samples.smk
will be important, because here not a fixed number of samples is simulated, but the simulations are iterated until enough introgressed samples have been found.
I created a new branch, original_model.
In this branch, you can find 4 snakemake-files for the replication of the introunet-model (also the yaml is updated accordingly). Similarly, to the snakemake-files in the batches
-branch, the simulation files are produced, added and removed in batches.
If I am not mistaken, all Unused and Duplicated Lines are removed, also in the training and inference scripts (which are now really slim).
I think the workflow now is much clearer.
Hello @jalhackl
I'd like you to refactor the
intronets_simulate_training_set_fixed_sample_size.smk
andintronets_simulate_training_set_many_samples.smk
files, usingintronets_simulate_training_set.smk
as a template.Specifically, please focus on the following tasks:
Remove Unused and Duplicated Lines: Identify and eliminate any redundant code in these files to streamline our workflow.
Split the Monolithic all Rule: Currently, the all rule in our snakemake files encompasses multiple steps in a single rule. Please divide it into several distinct rules. This will help clarify the workflow's logic by making the input, output, parameters, and resources required at each step explicit and well-documented.
By refactoring these files, you'll not only practice writing cleaner code but also deepen your understanding of Snakemake and workflow management.
This is a great opportunity to improve our code's readability and maintainability, as well as to get hands-on practice with Snakemake's best practices.
Thank you.