ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
38 stars 12 forks source link

Refactor main code #27

Closed joshfactorial closed 1 year ago

joshfactorial commented 2 years ago

We want a single point of entry, with suboptions, rather than a fragmented code base, similar to how GATK is called for all of it's subfunctions:

gatk [ToolName] ToolArguments

I think most of the model creations could be changed to flags in the main gen_reads workflow. Rather than running gen_mut_model.py, then running gen_reads.py, we have one command, where if you need to create the mutation model you can either do it as part of the gen_reads run or as a standalone process.

neat GenMutModel -i /path/to/file

neat GenReads -i /path/to/file --gmm /path/to/other_file

The key component here is to move all the python code to Source, so that we don't have to mess around with importing from neigboring modules, as we currently do in genSeqErrorModel.py (importing the DiscreteDistribution class, though that can also probably be just replaced with standard modules). We would use a bash launcher or a python launcher in the main folder that would call the source code and perform the actual runs.