skrub-data / skrub

Prepping tables for machine learning
https://skrub-data.org/
BSD 3-Clause "New" or "Revised" License
1.22k stars 97 forks source link

Enhance Datetime Features with Cyclical Encoding to Preserve Metric #1127

Closed mmartinb75 closed 3 weeks ago

mmartinb75 commented 3 weeks ago

Problem Description

The library currently extracts datetime features like hour, day, month, etc., but doesn't account for their cyclical nature. This can lead to models misinterpreting the relationship between time features. For example, the model might treat hour 23 as being far from hour 0, while in reality, they are adjacent.

Feature Description

Add two new features for each extracted datetime component (second, minute, hour, day, month, and week) using sine and cosine transformations. The frequencies for each component will match its cyclical property: 60 for second and minute, 24 for hour, the actual number of days in the given month for day, and 12 for month.

rcap107 commented 3 weeks ago

Related to #907

jeromedockes commented 3 weeks ago

thanks @mmartinb75 that's a great suggestion and as @rcap107 mentioned it is being discussed in #907 . basically the discussion has sort of stalled on the topic of using sin or splines, but it would be great to revive it and add this.

I will close this issue to avoid duplicating the conversation but I'll link your comment from there and please feel free to engage in the discussion in #907 !

mmartinb75 commented 3 weeks ago

right!!. thanks a lot @jeromedockes