google / temporian

Temporian is an open-source Python library for preprocessing ⚡ and feature engineering 🛠 temporal data 📈 for machine learning applications 🤖
https://temporian.readthedocs.io
Apache License 2.0
670 stars 44 forks source link

New operator: z-score normalization #343

Open ianspektor opened 9 months ago

ianspektor commented 9 months ago

New EventSet.z_score_normalize() (name TBD) operator.

See here for how to compute it.

See https://github.com/google/temporian/blob/main/CONTRIBUTING.md#developing-a-new-operator for guidance.

Questions or requests for additional guidance from possible contributors more than welcome!

akshatvishu commented 6 months ago

Hey @ianspektor, I have a few questions about putting this into action:

Q1) Will this be a python-only operator or a c++ one?

Q2) As far as I understand, we can't use scipy. So, we can't call scipy.stats.zscore directly thus, I was wondering, do we keep the arguments same as scipy.stats.zscore? Also, I'm interested in how we deal with NaNs .

Q3) What data types will this operator support? All numeric?

ianspektor commented 6 months ago

Tagging @javiber, he's the go-to person from now on for all things contributing :)

javiber commented 6 months ago

Hi @akshatvishu I think that we can implement this one using numpy's mean and std whiteout going down to c++.

Scipy's implementation for future reference: https://github.com/scipy/scipy/blob/v1.13.0/scipy/stats/_stats_py.py#L3021