sintel-dev / Zephyr

https://dtail.gitbook.io/zephyr/
MIT License
9 stars 2 forks source link

Integrating `SigPro` #7

Closed sarahmish closed 1 year ago

sarahmish commented 1 year ago

SigPro allows users to process time series signals and perform a wide-range of transformations and aggregations. We want to allow users to use SigPro through Zephyr.

Assume we have transformations and aggregations, we want to apply them to pidata or scada data.

Desired Behavior

Suppose we have the following pidata dataframe

_index timestamp COD_ELEMENT val1 va2
0 2022-01-02 13:21:01 0 1002.0 -98.7
1 2022-03-08 13:21:01 0 56.8 1004.2

and we want to compute the mean of the amplitude using the SigPro primitive sigpro.aggregations.amplitude.statistical.mean for each month of readings for the column val, then we get the following processed dataframe

_index time COD_ELEMENT mean
0 2022-01-31 0 1002.0
1 2022-02-28 0 null
2 2022-03-31 0 56.8

Proposed Function

def process_signals(es, signal_dataframe_name, signal_column, transformations, aggregations,
                    window_size, replace_dataframe=False, **kwargs):
    '''Process signals using SigPro.

    Apply SigPro transformations and aggregations on the specified entity from the
    given entityset. If ``replace_dataframe=True``, then the old entity will be updated.

    Args:
        es (featuretools.EntitySet):
            Entityset to extract signals from.
        signal_dataframe_name (str):
            Name of the dataframe in the entityset containing signal data to process.
        signal_column (str):
            Name of column or containing signal values to apply signal processing pipeline to.
        transformations (list[dict]):
            List of dictionaries containing the transformation primitives.
        aggregations (list[dict]):
            List of dictionaries containing the aggregation primitives.
        window_size (str):
            Size of the window to bin the signals over. e.g. ('1h).
        replace_dataframe (bool):
            If ``True``, will replace the entire signal dataframe in the EntitySet with the
            processed signals. Defaults to ``False``, creating a new child dataframe containing
            processed signals with the suffix ``_processed``.
    '''