pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.49k stars 1.97k forks source link

DOC: Clarify shape vs size #7194

Open lciti opened 3 months ago

lciti commented 3 months ago

Issue with current documentation:

After seeing that some methods, like dist, accept both shape and size I looked for information about the difference between the two but could not find anything useful. I had read Distribution Dimensionality before but I still couldn't understand the difference between them. I then dug into the code and found https://github.com/pymc-devs/pymc/blob/a06081e1e9649bd56e3528cb96380efdf6bb2dc0/pymc/distributions/shape_utils.py#L267 and it made sense. I wonder if this information should be more prominent, for example in the glossary at the page Distribution Dimensionality. Incidentally, I think there is a mistake in the definition "Shape → Number of draws from a distribution" - if I understand correctly "size" is the number of draws, not "shape" (if I take 5 draws from a Dirichlet([1,1,1]), the size is (5,) and the shape is (5,3)).

Idea or request for content:

I would add a clearer definition of "shape" and "size" somewhere more prominent in the documentation. A good place could be the Glossary at the page Distribution Dimensionality.

welcome[bot] commented 3 months ago

Welcome Banner] :tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

ricardoV94 commented 3 months ago

We are still not sure we want to keep the size around, and we try not to emphasize it too much to new users. Size contains a subset of the information of Shape has so it's strictly less informative.

If working with dims, one also thinks about shape. However atm size is used internally in the PyTensor RandomVariable Ops

ricardoV94 commented 3 months ago

https://github.com/pymc-devs/pytensor/discussions/52

lciti commented 3 months ago

Thank you for your reply! I wish I had seen https://github.com/pymc-devs/pytensor/discussions/52 sooner !! :-D I am trying to implement a dist for a CustomDist multivariate distribution where the support dimension is not directly one of the parameters. Since dist takes size rather than shape I spent some time trying to find a clever way around. In the end I reluctantly resorted to break away the last shape dimension into a parameter, which is what that page suggests it's pretty much the only solution. Feel free to close this issue if not relevant, unless you want to add a description of size with the disclaimer that it may disappear.