googledatalab / pydatalab

Google Datalab Library
Apache License 2.0
195 stars 81 forks source link

PythonOperator / Airflow Contrib Operators #692

Open Debasish-Das-CK opened 6 years ago

Debasish-Das-CK commented 6 years ago

I would like to re-use PipelineGenerator and add PythonOperator to it so that we can use it for dataflow and cloud-ml python API along with bigquery. There are corresponding airflow.contrib operators as well which can be potentially used and I am not sure why datalab defined Load/Execute/Extract operators...I was wondering whether it is possible to standardize on PythonOperator. Here is an example: bigquery.contrib.operator.ExecuteOperator(BaseOperator): def execute(self, context): job = query.execute(output_options=output_options, query_params=query_params) return { 'table': job.result().full_name }

Is it possible to use something as follows and use the pattern for Dataflow/CloudML runners as well or the idea is to come up with DataflowOperator/CloudMLOperator in datalab ?

def create_operator(self, query: bigquery.Query): return PythonOperator(query.execute, output_options)

Debasish-Das-CK commented 6 years ago

I looked into it further...I can re-use PipelineGenerator but in Datalab we need to define DataflowOperator + CloudMLOperator...please let me know if there are plans to add it as part of mlworkbench effort...