jalhackl / introunet

0 stars 0 forks source link

Refactor `intronets_simulate_training_set_fixed_sample_size.smk` and `intronets_simulate_training_set_many_samples.smk` #18

Closed xin-huang closed 6 months ago

xin-huang commented 6 months ago

Hello @jalhackl

I'd like you to refactor the intronets_simulate_training_set_fixed_sample_size.smk and intronets_simulate_training_set_many_samples.smk files, using intronets_simulate_training_set.smk as a template.

Specifically, please focus on the following tasks:

By refactoring these files, you'll not only practice writing cleaner code but also deepen your understanding of Snakemake and workflow management.

This is a great opportunity to improve our code's readability and maintainability, as well as to get hands-on practice with Snakemake's best practices.

Thank you.

jalhackl commented 6 months ago

In fact, intronets_simulate_training_set_fixed_sample_size.smk and intronets_simulate_training_set_many_samples.smk are rather for my testing purposes. For the next steps, the workflow of intronets_simulate_training_set_many_samples.smk will be important, because here not a fixed number of samples is simulated, but the simulations are iterated until enough introgressed samples have been found.

I created a new branch, original_model. In this branch, you can find 4 snakemake-files for the replication of the introunet-model (also the yaml is updated accordingly). Similarly, to the snakemake-files in the batches-branch, the simulation files are produced, added and removed in batches. If I am not mistaken, all Unused and Duplicated Lines are removed, also in the training and inference scripts (which are now really slim).

I think the workflow now is much clearer.