Open erwald opened 8 months ago
@erwald can you give me a code sample?
Basically I want to see two things to understand:
1.) Write the code that uses apply
the way that actually works
2.) Write pretend code that uses sq.sample
the way you ideally want it to work should this feature be implemented correctly
Here's some code:
import pandas as pd
import squigglepy as sq
N = 1000
series = pd.Series(range(1, 5)) * sq.norm(mean=0, sd=1)
print(series.apply(lambda row: sq.get_percentiles(row @ N, percentiles=[50]))) # works
print(sq.get_percentiles(series @ N, percentiles=[50])) # would be nice if it did work
The first print statement will output a series of medians:
: 0 0.046183
: 1 -0.003956
: 2 -0.016223
: 3 -0.025846
The second print statement does not work because you currently can't sample a series or data frame. I mostly want this as convenience, but it might also be possible to get performance benefits from doing this, since I believe you would get the performance benefits of vectorized operations (like multiply, etc.) when sampling?
(Of course the above example would also require get_percentiles
to be vectorized in this way.)
A common use case here is that I have some time series of estimates represented as squigglepy distributions, and I want to get the medians and 5th and 95th percentiles (or w/e) for each row for plotting.
It would be nice to have a vectorized version of
sq.sample
. I often find myself having data frames or series that contain distributions, and when I want to expand those to medians and percentiles, I have to useapply
which is slow and a bit verbose. Nicer would be to be able to sample the entire series at once. (This is a feature request.)