sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.04k stars 697 forks source link

Distributed data computing in TO RUN clause. #2297

Open brightcoder01 opened 4 years ago

brightcoder01 commented 4 years ago

In SQLFlow TO RUN clause, it will call a python function to do the data processing/computing. Such as use TSFresh to extract features from time series data.

There are two options for large-scale data computing with python:

The compare between Dask and mars: issue

wangkuiyi commented 4 years ago

Thanks to @brightcoder01 for raising this question.

I reorganized the comparison in the above comment into a table.

Dask Mars
Kubernetes Yes No
MaxCompute No Yes

It seems that the question is

I think the answer is to let the author of a_python_func to make the choice among Mars, Dask, Ray, Kubernetes API, etc.

brightcoder01 commented 4 years ago

Add notes from discussion: