It would be useful for test cases, and for benchmarking if we had an easy way to generate synthetic data. Not that we want to be too heavily reliant on synthetic data - but it could give us a sense of what the limit of detection might look like in an ideal case and it could also be a good sanity check - ie do the assumption baked into our model fitting actually produce data that looks like what we expect? :)
Here are a few things I thought it would be worth thinking about while we get started:
Do we want a probabilistic model for example a Markov model? (which could be nice because it might be able to generate some complex test cases) or a deterministic model (where we could tune parameters of interest) - or should we have both! :)
What features do we want our model to capture - ie - can we build in ramps? Multiple transcription start sites and ends? What other features are biologically relevant?
Can we write a modular noise function? This would be nice because it might allow us to create/test synthetic data sets with different noise assumptions.
Note - this issue is more intended for brainstorming/planning!! :) Once we decide what we want to implement we can open new issues for the generative model (or models) we want to build!
It would be useful for test cases, and for benchmarking if we had an easy way to generate synthetic data. Not that we want to be too heavily reliant on synthetic data - but it could give us a sense of what the limit of detection might look like in an ideal case and it could also be a good sanity check - ie do the assumption baked into our model fitting actually produce data that looks like what we expect? :)
Here are a few things I thought it would be worth thinking about while we get started: