Open jwhitaker-gridcog opened 7 months ago
I don't know for sure but I suspect it is intended behavior. This way, after running upsample you can rely on the spacing from row to row being consistent. I think you need to do something like this:
df.join(
df.upsample("t", every="120s"),
on='t', how='outer_coalesce'
)
Not sure on the intended behaviour, but this is what is happening:
upsample generates the range and performs a left join.
The range in this case just returns the start value (as the interval exceeds the range)
df.select(
range = pl.date_range(pl.col("t").min(), pl.col("t").max(), interval="120s")
)
# shape: (1, 1)
# ┌─────────────────────────┐
# │ range │
# │ --- │
# │ datetime[ms, UTC] │
# ╞═════════════════════════╡
# │ 2021-01-01 00:00:00 UTC │
# └─────────────────────────┘
I guess there are two schools of thought.
Doing #2
is certainly better than what I said above which does 2 joins when you only need 1.
I have the same issue.
One correction: it is not discarding only existing higher frequency, in fact it discards any data, which are not matching the target sampling frequency.
It is surprising and I would invite:
My attempt to rewrite my code from pandas to polars failed on this and shows, that in this regard (resampling) pandas is providing better service. (I am aware pandas is very long with us and I really love polars)
I have the same issue.
One correction: it is not discarding only existing higher frequency, in fact it discards any data, which are not matching the target sampling frequency.
It is surprising and I would invite:
either the documentation
- explicitly explaining this behaviour
- and providing an example how to succeed
- or the upsample method behave more as expected by many
My attempt to rewrite my code from pandas to polars failed on this and shows, that in this regard (resampling) pandas is providing better service. (I am aware pandas is very long with us and I really love polars)
Yeah in my case, i have low frequency data, but because the points do not perfectly align it strait up discards data. Very misleading function!
Checks
Reproducible example
Log output
Issue description
When running
upsample
, existing data at a higher frequency than theupsample
target seems to be discarded. E.g. in the example, my dataframe has two rows at 00:00 and 00:01, and I upsample to2m
(same result with1h
,61s
, ...).After upsampling, the second row has been discarded.
Expected behavior
Maybe naïvely, I expected existing higher-frequency data to be preserved.
If this behaviour from polars is intentional, then I'd at least have expected to see something in the
upsample
docs likeor at a minimum
Thanks for your work maintaining Polars!
Why are you upsampling to a lower frequency than your data, are you stupid?
For context, my situation here was that I have data that in a mix of daily and hourly. There are missing records in both parts. I was attempting to upsample to
daily
first, because I wanted to treat missing records here differently to missinghourly
records. However, this got me to hit this bug/unexpected behaviour, because upsampling todaily
chucked away some of thehourly
records.Installed versions