Open brightcoder01 opened 4 years ago
def add_extracted_ts_features(
iterator,
column_id,
column_time,
column_values,
windows,
extract_setting):
"""
Extract the features from the input containing the time series
data and then append the extract features into the source input.
Arguments:
iterator:
The iterator for the input data.
id_column:
str. The name of the id column to group by.
time_column:
str. The name of the time column.
value_columns:
List of str. The name of the columns for the time series data.
windows:
List of int. The sliding window sizes with which we will
try to roll original the data.
extract_setting:
str. The feature extraction setting. It's one of the values in
['Comprehensive', 'Efficient', 'IndexBased',
'Minimal', 'TimeBased']
Returns:
A pandas.DataFrame. It contains both the original column from
the input iterator and the extracted feature columns from the time series
columns.
"""
pass
SELECT * FROM source_table
TO RUN add_extracted_ts_features
WITH
id_column = id,
time_column = record_date,
value_columns = ['pv', 'uv'],
windows = [1, 5, 10],
extract_setting = Minimal
INTO result_table
SQLFlow describes an end-to-end machine learning pipeline. Data transformation is an important part in the entire process.
COLUMN
clause.TO RUN
clause to support this functionality. Please refer to the discussion #2137Please check the following example SQL statement:
{function_name}
is the name of data transformation function. It can be either a built-in function from SQLFlow or the customized function provided by the users. We will support built-in function at the first step. TSFresh is our first built-in function.{source_table}
is the name of the input table from which the transform function above read the data.{result_table}
is the name of the output table into which the transform function above will write the processed result.The design doc
link.
Task break down
[x] Upgrade parser to support TO RUN statement
[ ] Translate
TO RUN
to a workflow[x] Upgrade goalisa to submit PyODPS task. Enable submitting ODPS SQL and PyODPS task on the deployment of Dataworks.
[ ] sqlflow.runner module.
[ ] TSFresh high level api implementation and docker image.
[ ] TO RUN function repo sample.