ncsa / NEAT

NEAT (NExt-generation Analysis Toolkit) simulates next-gen sequencing reads and can learn simulation parameters from real data.
Other
38 stars 12 forks source link

Overhaul probabilities #25

Closed joshfactorial closed 1 year ago

joshfactorial commented 2 years ago

This will need to be broken up into smaller pieces, but the general idea is that we want to overhaul how probabilities are calculated in NEAT. The current code, especially the probability.py file, is a slow point in the code.

Specifics:

1) compute_fraglen.py could just calculate a median and std dev or equivalent, then that could be used as input to a standard probability distribution about that point. 2) Investigate the same for sequencing error 3) Investigate the same for mutation model 4) In SequenceContainer, investigate how to improve the random mutations models.