I would like to re-use PipelineGenerator and add PythonOperator to it so that we can use it for dataflow and cloud-ml python API along with bigquery. There are corresponding airflow.contrib operators as well which can be potentially used and I am not sure why datalab defined Load/Execute/Extract operators...I was wondering whether it is possible to standardize on PythonOperator. Here is an example:
bigquery.contrib.operator.ExecuteOperator(BaseOperator):
def execute(self, context):
job = query.execute(output_options=output_options, query_params=query_params)
return {
'table': job.result().full_name
}
Is it possible to use something as follows and use the pattern for Dataflow/CloudML runners as well or the idea is to come up with DataflowOperator/CloudMLOperator in datalab ?
I looked into it further...I can re-use PipelineGenerator but in Datalab we need to define DataflowOperator + CloudMLOperator...please let me know if there are plans to add it as part of mlworkbench effort...
I would like to re-use PipelineGenerator and add PythonOperator to it so that we can use it for dataflow and cloud-ml python API along with bigquery. There are corresponding airflow.contrib operators as well which can be potentially used and I am not sure why datalab defined Load/Execute/Extract operators...I was wondering whether it is possible to standardize on PythonOperator. Here is an example: bigquery.contrib.operator.ExecuteOperator(BaseOperator): def execute(self, context): job = query.execute(output_options=output_options, query_params=query_params) return { 'table': job.result().full_name }
Is it possible to use something as follows and use the pattern for Dataflow/CloudML runners as well or the idea is to come up with DataflowOperator/CloudMLOperator in datalab ?
def create_operator(self, query: bigquery.Query): return PythonOperator(query.execute, output_options)