pymc-devs / pytensor

PyTensor allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays.
https://pytensor.readthedocs.io
Other
300 stars 91 forks source link

Implement equivalent numpy median and quantile / percentile #53

Open ricardoV94 opened 1 year ago

ricardoV94 commented 1 year ago

Please describe the purpose of filing this issue

Equivalent symbolic methods to those are missing.

The Numpy quantile and percentile methods have too many options for the interpolation argument, and these are planned to be deprecated for a while now (see https://github.com/numpy/numpy/issues/10736). It should suffice to implement the default "linear" interpolation.

I am confident that these should not require any extra Ops.

Dhruvanshu-Joshi commented 1 month ago

Here do we want to use the np.median itself or do we need to right the logic for median in pytensor?

ricardoV94 commented 1 month ago

Here do we want to use the np.median itself or do we need to right the logic for median in pytensor?

Good question, depends on how complicated the median code would look like if written in pure PyTensor. If it's messy we can just use np.median under the hood

Dhruvanshu-Joshi commented 1 month ago

I think using numpy under the hood is better. They even handle nans and have also allow inplace median by overwriting the input whenever required. So a new op makes sense for this or just a function which calls np.median under the hood will be enough?

ricardoV94 commented 1 month ago

For median we can simply (but a bit naively) sort out and pick the middle index (or the average of the two middle ones if it's even length). There are better approaches, but this may be a good enough solution for now: https://rcoh.me/posts/linear-time-median-finding/

ricardoV94 commented 1 month ago

or just a function which calls np.median under the hood will be enough?

If using numpy code, it must always be inside an Op. Normal numpy code won't work with PyTensor symbolic variables (which are just placeholders and don't contain values yet)

Dhruvanshu-Joshi commented 1 month ago

or just a function which calls np.median under the hood will be enough?

If using numpy code, it must always be inside an Op. Normal numpy code won't work with PyTensor symbolic variables (which are just placeholders and don't contain values yet)

Ohk. So I'll try to write the median in pytensor itself since we want to avoid creating new ops everytime right?

ricardoV94 commented 1 month ago

Ohk. So I'll try to write the median in pytensor itself since we want to avoid creating new ops everytime right?

Not anytime, but as much as possible!