Open ariddell opened 9 years ago
In this code a Model means something with extensive latent variables (in addition to intensive latent variables). That is, a Model has a latent variable for each data point, hence it needs an add_data
to 'wrap' data sequences into objects that glue on those latent variables.
That's how it is now, but maybe something else would be better. I think you're suggesting that Distributions should also have an add_data
so they can implicitly remember data, so that e.g. calling resample()
with no arguments would implicitly be resampling based on that added data. Is that right?
If that's the case, we could probably accomplish that with a mixin like this one (untested, probably doesn't work with Python 3 or at all):
class _AddDataMixin(object):
def __init__(self, *args, **kwargs):
self.data_list = []
super(_AddDataMixin, self).__init__(*args, **kwargs)
def add_data(self, data):
self.data_list.append(data)
def resample(self, data):
super(_AddDataMixin, self).resample(combine_data(data, self.data_list))
Can you spell out the use case you have in mind? Maybe gluing some data to a distribution for when we're working in a semi-supervised setting?
yeah, that was more or less what I was thinking of. Perhaps there's some
additional checking the mixin should do to make sure that the thing it
is being mixed in with already has a resample
method?
Also, what's the convention you're following for adding _
before some
mixins and not for others?
On 08/07, Matthew Johnson wrote:
In this code a Model means something with extensive latent variables (in addition to intensive latent variables). That is, a Model has a latent variable for each data point, hence it needs an
add_data
to 'wrap' data sequences into objects that glue on those latent variables.That's how it is now, but maybe something else would be better. I think you're suggesting that Distributions should also have an
add_data
so they can implicitly remember data, so that e.g. callingresample()
with no arguments would implicitly be resampling based on that added data. Is that right?If that's the case, we could probably accomplish that with a mixin like this one (untested, probably doesn't work with Python 3 or at all):
class _AddDataMixin(object): def __init__(self, *args, **kwargs): self.data_list = [] super(_AddDataMixin, self).__init__(*args, **kwargs) def add_data(self, data): self.data_list.append(data) def resample(self, data): super(_AddDataMixin, self).resample(combine_data(data, self.data_list))
Reply to this email directly or view it on GitHub: https://github.com/mattjj/pybasicbayes/issues/25#issuecomment-128716365
I think it's a Python convention to put an underscore in front of "internal" things that aren't part of the user API. Or maybe I just made it up.
Got it. Just thinking that in this case one really wants to encourage reuse of the traits by users -- i.e., folks adding new distributions and (derived) models.
I might get around to doing this. It seems like it's worthwhile just for teaching, e.g., show somehow how to get samples from a GaussianModel (derived from Gaussian
) which, in theory, someone might want to do.
So the distinction between Model and Distribution is confusing for very simple models (e.g., sometimes a Gaussian is the model; you want to .add_data to that.). What about a class decorator or metaclass wrapper that creates a DistNameModel for each DistName with no code duplication.
Or perhaps you're considering other API changes?