openclimatefix / ocf-data-sampler

A test repo to experiment refactoring ocf_datapipes
MIT License
1 stars 1 forks source link

Refactor fill time periods function #25

Closed Sukh-P closed 1 month ago

Sukh-P commented 1 month ago

Pull Request

Description

Small refactor of fill_time_periods to remove using iterrows() to hopefully make it more efficient

How Has This Been Tested?

A unit test already existed for this which is great so just checked that still passes

Sukh-P commented 1 month ago

Could we use:

 start_dts = pd.to_datetime(time_periods["start_dt"]).ceil(freq)
 end_dts = pd.to_datetime(time_periods['end_dt'])

?

So that won't work for the start_dts but yeah that's a nicer way to do the end_dts, will change that now

dfulu commented 1 month ago

So that won't work for the start_dts but yeah that's a nicer way to do the end_dts, will change that now

Oh okay, why won't it work for the start_dts? I ran:

pd.to_datetime(["2020-01-01 11:22", "2020-01-01 11:55",]).ceil(pd.Timedelta("1h"))
>> DatetimeIndex(['2020-01-01 12:00:00', '2020-01-01 12:00:00'], dtype='datetime64[ns]', freq=None)

Am I missing something?

Sukh-P commented 1 month ago

I was missing a .values! Works now

dfulu commented 1 month ago

I was missing a .values! Works now

Cool! Nice work. Sorry to nitpick but to keep it cleaner can we add .values to both, otherwise it raises the question of why only have it for one. Right now I think start_dt is a pd.DatetimeIndex but end_dt is a pd.Series