Code review - distributions

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L68-L69

Why set parameters=None as a default and not parameters={} if this is transformed anyway?

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L81-L89

It seems dangerous to me that the property and the setter have the same name

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L91-L111

Same here.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L113-L123

Why not return the empty dict here? I think there are settings in which you generally need the parameters of a distribution and hence use the parameters property. If there are none, Raising an error might be painful and might inhibit some algorithms from terminating. Thoughts?

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L125-L127

Agree that this is a valid to-do. IMO it should be done at the __init__stage.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L151

I personally am a big fan of not using maths-variables in code, i.e. no f, no x, no sigma, etc.. In my opinion that increases readability significantly. On top of this, it is part of Pep8 (which I am fine with breaking occasionally, as it is more of a guide than a law).

Suggestion to replace x here and in related functions: loc, pt, point, samp, sample (the "sample" input to PDFs is part of the description at the wikipedia page of PDFs), event. Thoughts?

I know that avoiding maths in scientific computing is tedious, but I have found it to be worthwile.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L165-L170

Replace hasattr with is None (I think the current version is an outdated, older version).

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L165-L170

I find it more logical to read if... elif... else or at least if... elif.

The same applies in the lines below in the code

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L242

If self._cdf is not defined this would lead to an error.

pnkraemer commented 4 years ago

About the previous one again: is "median" something that can be part of parameters?

Is parameters even required? It would be less type-checking and more concise if there was no parameters at all, and if a distribution depends on parameters, this would be part of the distribution.

For instance:

class Normal(Distribution):
    def __init__(self, mean, covar):
        self._mean = mean
        self._covar = covar
        Distribution.__init__(self, _gausspdf, _gausslogpdf, _gausscdf, _gausslogcdf))

or something along these lines.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L66-L79

Is Distribution intended to be subclassed or intended to be initialised with functions? The current implementation suggests the latter whereas the former is part of the docstring.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L261-L276

Did you mean covar?

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L278-L287

These kind of convenience methods just blow up classes, IMO. I think whoever is able to call covar is able to turn a variance into a standard deviation. Also, here one has to be careful with multivariate distributions (-> Cholesky) and univariate distributions (-> sqrt). For this reason, I would suggest to take out std.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L306-L490

If implementation of these dunder methods is supposed to be handled by subclasses, I would only write raise NotImplementedError here. That makes it clear that the Distribution class does not support addition, etc. by default.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L536-L558

IMO, this makes sense to be extracted into a function _scpstats_to_dist(...) or something along these lines.

pnkraemer commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L527-L528

One thing that might be useful to think about is whether we restrict Diracs, Gaussians, etc. to be supported on (d,) arrays and make scalars etc. raise errors. Everybody can transform a scalar into an array, but it is more convenient to work with numpy arrays only (we can then use operations like @ and others).

JonathanWenger commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L68-L69

Why set parameters=None as a default and not parameters={} if this is transformed anyway?

Having a mutable object like dict as an argument is an anti-pattern.

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L306-L490

If implementation of these dunder methods is supposed to be handled by subclasses, I would only write raise NotImplementedError here. That makes it clear that the Distribution class does not support addition, etc. by default.

Good idea. This will help in resolving #33.

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L81-L89

It seems dangerous to me that the property and the setter have the same name

This is proper python 3 syntax. See https://docs.python.org/3/library/functions.html#property.

coldfix commented 4 years ago

Hi, sorry to tune in like this, I just looked in this issue by accident.

If implementation of these dunder methods is supposed to be handled by subclasses, I would only write raise NotImplementedError here. That makes it clear that the Distribution class does not support addition, etc. by default.

The proper way to defer handling to other dunders is to return NotImplemented, see https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy. (Also sorry if I read this out of context, I didn't read through the thread)

pnkraemer commented 4 years ago

Thanks @JonathanWenger and @coldfix, I didnt know these things :)

JonathanWenger commented 4 years ago

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L113-L123

Why not return the empty dict here? I think there are settings in which you generally need the parameters of a distribution and hence use the parameters property. If there are none, Raising an error might be painful and might inhibit some algorithms from terminating. Thoughts?

self._parameters is initialized to an empty dict, so if it is None something went wrong. Changed it to an AttributeError.

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L151

I personally am a big fan of not using maths-variables in code, i.e. no f, no x, no sigma, etc.. In my opinion that increases readability significantly. On top of this, it is part of Pep8 (which I am fine with breaking occasionally, as it is more of a guide than a law).

Suggestion to replace x here and in related functions: loc, pt, point, samp, sample (the "sample" input to PDFs is part of the description at the wikipedia page of PDFs), event. Thoughts?

I know that avoiding maths in scientific computing is tedious, but I have found it to be worthwile.

I agree with you with the maths notation. However for things like pdf I tried to follow scipy convention. I am also not sure what term to use here, e.g. pdf(quantile) or pdf(sample) seem way more confusing to me.

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L242

If self._cdf is not defined this would lead to an error.

I would say this is desired fall-back behaviour. If the median is otherwise known, subclasses of distributions can override this method.

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L278-L287

These kind of convenience methods just blow up classes, IMO. I think whoever is able to call covar is able to turn a variance into a standard deviation. Also, here one has to be careful with multivariate distributions (-> Cholesky) and univariate distributions (-> sqrt). For this reason, I would suggest to take out std.

I think convenience methods are good for inexperienced users. These will always raise an error unless var is implemented by the sub-class and thus only available for 1d distributions.

https://github.com/probabilistic-numerics/probnum/blob/9a4e6cf4aee888ad447fe4228e576c7284aa90dd/src/probnum/probability/distribution.py#L536-L558

IMO, this makes sense to be extracted into a function _scpstats_to_dist(...) or something along these lines.

✅

JonathanWenger commented 4 years ago

Issues needing no discussion are fixed in 151fa36.

probabilistic-numerics / probnum

Code review - distributions #38