Closed grahamgower closed 4 years ago
I like it @grahamgower, I think having a function that returns a Model instance would be a nice simple way setting things up.
I'm not totally convinced this is better for the generic models (which really do represent classes of generic models that can be instantiated, and are part of the module's public namespace), but I think it'd be much better for the concrete catalog models.
One situation in which it would make sense to keep the models as classes rather than just functions is when you wish to simulate under slightly different demographic parameters that are drawn from a prior that is specific to the model class. Atm, it seems like stdpopsim is focused on fixing the parameters using point estimates from papers (a v worthy and ambitious goal in itself!) -- but if this broader alternative was to ever be part of stdpopsim, it might be useful to keep things as they are. In this situation, each simulation would be generated from a particular demographic object that is instantiated from a broader model class.
Btw, I don't know whether doing this would be "too broad" for stdpopsim, but I think that this would be a useful thing to at least think about given the stated aims of the consortium re: doing powerful standardised benchmarking. It would allow users to perform "sensitivity analyses" more easily - ie., tests of how robust various methods are to slight uncertainties and misspecifications in their model.
I really like the idea of having these kinds of models in stdpopsim. It can already be done. Either as a function,
from scipy.stats import uniform
import stdpopsim
def my_model_generator(nreps):
N0 = uniform(100, 500).rvs(size=nreps)
N1 = uniform(1000, 10000).rvs(size=nreps)
T = uniform(300, 3000).rvs(size=nreps)
for n0, n1, t in zip(N0, N1, T):
yield stdpopsim.PiecewiseConstantSize(n0, (t, n1))
def my_model_func():
# just get one
return next(my_model_generator(1))
Or as a class,
class MyModelClass(stdpopsim.PiecewiseConstantSize):
N0 = uniform(100, 500)
N1 = uniform(1000, 10000)
T = uniform(300, 3000)
def __init__(self):
return next(MyModelClass.generator(1))
@staticmethod
def generator(cls, nreps):
N0 = cls.N0.rvs(size=nreps)
N1 = cls.N1.rvs(size=nreps)
T = cls.T.rvs(size=nreps)
for n0, n1, t in zip(N0, N1, T):
yield super().__init__(n0, (t, n1))
Just a matter of taste.
Sure - but if only implemented as a function, it's harder to extract the exact demographic parameters used in a given sim. I think it would be more natural for these parameters to be attributes of a particular model instance that are simulated from methods in the model class. In your class code above for example, these parameters can be retrieved from myModelClass.N0
, whereas in the function implementation, you'd have to return the parameters along with the simulation output.
Maybe a better way to explain it is in @jeromekelleher's comment:
I'm not totally convinced this is better for the generic models (which really do represent classes of generic models that can be instantiated, and are part of the module's public namespace), but I think it'd be much better for the concrete catalog models.
In a world where the catalog models have random rather than fixed parameter values, they basically just become another example of a generic model.
Oh, I somehow didn't see that big pull request @grahamgower. Sorry if this was unhelpful to hear only after doing all that work 😞
Sure - but if only implemented as a function, it's harder to extract the exact demographic parameters used in a given sim. I think it would be more natural for these parameters to be attributes of a particular model instance that are simulated from methods in the model class. In your class code above for example, these parameters can be retrieved from myModelClass.N0, whereas in the function implementation, you'd have to return the parameters along with the simulation output.
I don't think there's any difference here between getting the attributes from a super class or from an instance @gtsambos, it's the same variables that you'd be accessing. To do things properly, you'd need to think carefully about how to arrange the relevant variables to make them accessible for generating random values. These would look quite different to the classes that we currently have (which really are just functions returning a Model instance).
So, I think we should make the design we have as simple as possible to make things easy for model implementers for now, and think about how to generate random distributions around them later. Please do open up an issue for discussion around this though --- it's a really good application of these models.
I guess what I'm saying is that a class makes more sense to me when a number of different attributes or outputs might need to be bundled together in some way. In the catalog models as they currently stand, this isn't an issue -- the same demographic parameters hold regardless of the simulation run. This is also basically true for the generic models at the moment, where the users are inputting their desired parameter values. In a random parameter universe, each individual simulation is associated with its own, potentially unique set of parameters specific to that particular run -- so there are several outputs that the user needs to know about, and to me it seems like it would be cleanest to do this by keeping all of the parameters together in a class.
I totally understand the point about making things as easy as possible for developers, though.
Please do open up an issue for discussion around this though --- it's a really good application of these models.
Yes, I'd be interested in getting this into stdpopsim, and I think I'd also be quite handy for the development given some of my other recent work
The functions return an instance of one of these classes though, so it's really just a semantic difference. Generating models from prior distributions will need extra infrastructure and something that looks quite different to what we have now. We're deliberately not exposing the details of how catalog models are implemented internally so that we have the freedom to rearrange things as we like. This is just an internal implementation detail, and I'm sure it'll all be refactored and rearranged many times --- but none of this will affect end user. It's easy to refactor model definitions when they've been submitted and QC'd, but we have to make adding them as simple and streamlined as possible. The class infrastructure is just confusing boilerplate at the moment, so I say we get rid of it.
Okay, I'm coming around -- that makes sense! :)
If all models are classes, and parameters are required to be attributes (as for N0
, N1
, T
, in MyModelClass
below), then this sort of model abuse would be possible:
from scipy.stats import uniform
import stdpopsim
class MyModelClass(stdpopsim.PiecewiseConstantSize):
N0 = 100
N1 = 1000
T = 600
def __init__(self):
super().__init__(self.N0, (self.T, self.N1))
# this could be a classmethod of stdpopsim.Model
@classmethod
def frob(cls, **kwargs):
attr = {}
for param, distrib in kwargs.items():
attr[param] = distrib.rvs(size=1)[0]
newclass = type("Frobbed"+cls.__name__, (cls,), attr)
return newclass()
m = MyModelClass.frob(T=uniform(300, 3000))
print(m.T)
print(MyModelClass.T)
Although this kind of approach makes it hard to have a bound for one random variable depend on the value drawn for another.
stdpopsim/catalog/homsap.py
has the following comment regarding Model subclasses:The Model subclasses just define a few attributes and an
__init__
function. So they're not really classes. UsingPiecewiseConstantSize
andGenericIM
as examples, here's what the refactored versions could look like (untested!):Obviously, all existing model definitions would need changing, plus documentation, ...