mattjj / pybasicbayes

MIT License
153 stars 60 forks source link

Model vs. Distribution solution #25

Open ariddell opened 9 years ago

ariddell commented 9 years ago

So the distinction between Model and Distribution is confusing for very simple models (e.g., sometimes a Gaussian is the model; you want to .add_data to that.). What about a class decorator or metaclass wrapper that creates a DistNameModel for each DistName with no code duplication.

Or perhaps you're considering other API changes?

mattjj commented 9 years ago

In this code a Model means something with extensive latent variables (in addition to intensive latent variables). That is, a Model has a latent variable for each data point, hence it needs an add_data to 'wrap' data sequences into objects that glue on those latent variables.

That's how it is now, but maybe something else would be better. I think you're suggesting that Distributions should also have an add_data so they can implicitly remember data, so that e.g. calling resample() with no arguments would implicitly be resampling based on that added data. Is that right?

If that's the case, we could probably accomplish that with a mixin like this one (untested, probably doesn't work with Python 3 or at all):

class _AddDataMixin(object):
    def __init__(self, *args, **kwargs):
        self.data_list = []
        super(_AddDataMixin, self).__init__(*args, **kwargs)

    def add_data(self, data):
        self.data_list.append(data)

    def resample(self, data):
        super(_AddDataMixin, self).resample(combine_data(data, self.data_list))
mattjj commented 9 years ago

Can you spell out the use case you have in mind? Maybe gluing some data to a distribution for when we're working in a semi-supervised setting?

ariddell commented 9 years ago

yeah, that was more or less what I was thinking of. Perhaps there's some additional checking the mixin should do to make sure that the thing it is being mixed in with already has a resample method?

Also, what's the convention you're following for adding _ before some mixins and not for others?

On 08/07, Matthew Johnson wrote:

In this code a Model means something with extensive latent variables (in addition to intensive latent variables). That is, a Model has a latent variable for each data point, hence it needs an add_data to 'wrap' data sequences into objects that glue on those latent variables.

That's how it is now, but maybe something else would be better. I think you're suggesting that Distributions should also have an add_data so they can implicitly remember data, so that e.g. calling resample() with no arguments would implicitly be resampling based on that added data. Is that right?

If that's the case, we could probably accomplish that with a mixin like this one (untested, probably doesn't work with Python 3 or at all):

class _AddDataMixin(object):
    def __init__(self, *args, **kwargs):
        self.data_list = []
        super(_AddDataMixin, self).__init__(*args, **kwargs)

    def add_data(self, data):
        self.data_list.append(data)

    def resample(self, data):
        super(_AddDataMixin, self).resample(combine_data(data, self.data_list))

Reply to this email directly or view it on GitHub: https://github.com/mattjj/pybasicbayes/issues/25#issuecomment-128716365

mattjj commented 9 years ago

I think it's a Python convention to put an underscore in front of "internal" things that aren't part of the user API. Or maybe I just made it up.

ariddell commented 9 years ago

Got it. Just thinking that in this case one really wants to encourage reuse of the traits by users -- i.e., folks adding new distributions and (derived) models.

I might get around to doing this. It seems like it's worthwhile just for teaching, e.g., show somehow how to get samples from a GaussianModel (derived from Gaussian) which, in theory, someone might want to do.