What's our type promise?

ragulpr commented 6 years ago

As we started the discussion on slack and since the design doc isn't up to date on this I thought it would be good to discuss it here for future reference.

Currently in the design doc:

Input Types    
What input types should we support? In Probabilistic Torch we have so far required that parameters are either Variables or Numbers. Should we additionally support Tensor parameter arguments? If so, then how do we deal with gradients in reparameterized distributions when Tensor arguments are supplied?

So it feels like its not settled or I'm just not keeping up

Problems

1) We worry about keeping combatibility with Numbers creating need to validate_log_prob_arg 2) What should be acceptable query-types for cdf, inv_cdf, pdf, pmf? Shouldn't this be validated as with log_prob? It would be nice if scalar queries are accepted. 2) broadcast_call is potentially expensive #80. See code 3) Despite upcasting we still do things conditionally on type such as math.log(self.scale) if isinstance(self.scale, Number) else self.scale.log() 4) We need specialized Softmax for both Tensors & Variables 5) tests are largely concerned with the problem of scalar, tensor and variable mixing. 6) Slightly related, we infer type, precision and device placement in rsample. But how do will utilize precision-dependent epsilons in a clean way? 7) Type confusion seems to block cool ideas

Possible reasons

Upcoming Tensor/Variable merge
Being able to construct and query distributions using scalars is nice. But this niceness comes at a cost and sometimes without the niceness (like log_prob only accepts tensor xor Variable)

Possible solutions

I'm thinking from the assumption that we'll soon have deprecated tensors in favor of variables and 0d Variables. Since mixing is a herdle, can't we just decide something like:

All queries on a Distribution returns Variable?
All distribution parameters upcasted to Variable?
All distribution query args upcasted to Variable?

This is mostly the case currently anyway.

We have good entry point for casting as parameters will be upcasted and reshaped using broadcast_all and arguments to (at least) log prob will have to pass validate_log_prob_arg.

With a strong promise on input and return type this could be made fast (as proposed in #80). If this is made fast before we've figured this out we might be stuck with it.

I also like how the nn module is working and I expect to be able to do things like calling .double(), .float() on a distribution. This is easily implemented in Distribution if we assume access to all the parameters of a distribution programmatically as a tuple or dict as discussed in #130. By clearing up types now I imagine that the transition will require less ifs & elses.

fritzo commented 6 years ago

I think the type promise should follow torch.add() rather than nn.Module. Distributions are mere flyweight classes that wrap a collection of functions. Distributions have no state (but the may cache computations). Adding a .float() method to distributions seems like overengineering to me; it would be like adding a .float() method to functools.partial(torch.add, x).

fritzo commented 6 years ago

Hi @ragulpr, after reading your comment on the design doc more carefully, I believe there is merely a misunderstanding of the term 'scalar' in PyTorch. In PyTorch, 'scalar' is a concept orthogonal to 'Variable': a Tensor or Variable is a scalar iff it has empty shape, i.e. x.shape == torch.Size(). In torch.distributions we allow two distinct types of inputs for parameters: Variables and numbers.Number (including float, int, etc.). The latter are not PyTorch scalars, they are 'numbers'. In torch.distributions we always upcast a number to a Variable. In PyTorch 0.4, the number will be upcast to a Variable that is a scalar, i.e. has empty shape. This is the same behavior as e.g. torch.add().

ragulpr commented 6 years ago

Hi and thanks @fritzo for the answer. I don't see how Distribution have no state, isn't the parameters a state? By scalar I mean Number, which I understand as a 0d float, int or similar. Currently in pytorch-master Tensor parameters are allowed and by broadcast_all on probtorch-fork it seems similar;

        raise ValueError('Input arguments must all be instances of numbers.Number, torch.Tensor or ' +
'torch.autograd.Variable.')

I think I went into too much detail, in short I'm having a hard time understanding why there's so much effort keeping alive the possibility of initializing a distribution using Number parameters while requiring all query-values to log_prob, cdf, pdf etc[1] to be Tensor or Variable due to validate_log_prob_arg. Especially since such queries, given initialization with Number are forced to be Variable due to current broadcast_all casting Number to Variable (since Tensors and Variables don't mix)

It feels hard to keep track and testing for all errors that can occur and I'm wondering if its possible to simplify this.

My use for Distributions is mainly a really nice "prediction" and loss-object for probabilistic/Density-networks used for querying/analysis or to sample from, so if I can choose being able to query with Number or constructing with Number-parameters I'd choose the former. My usage might not be intended though. Currently the latter is implemented with all its complexity of keeping compatibility with Tensor, while we tacitly don't support it since Number is in reality cast to Variable. So why not just cast everything to Variable?

[1] I noticed now that we validate_log_prob_arg for all queries but its not merged to master

ragulpr commented 6 years ago

Also, I think I see more use for being able to infer current precision/storage of parameters than cast them to other precision in order to abstract _finfo so that it's distribution-independent. This information could also be used instead of dist.scale.new(_) but maybe this is bikeshedding 😃

ragulpr commented 6 years ago

I missed the related #5248, #132 and #134 which seems to establish a nice practice of validating parameters upon construction (not through broadcast_all) and some discussion about inconsistency of catching errors for args to log_prob vs sample. Sorry to starting a separate issue.

probtorch / pytorch