sql-machine-learning / sqlflow

Brings SQL and AI together.

https://sqlflow.org

Apache License 2.0

5.09k stars 700 forks source link

Tracing error before submitting generated code to cluster #2228

Open Yancey1989 opened 4 years ago

Yancey1989 commented 4 years ago

2205 try to load `estimator` to diagnostic missing mode arguments error and raise `SQLFlowDiagnosticError` with diagnostic message, it works well when generated code running on workflow step host.

But sometimes, SQLFlow would submit the generated code to a cluster to run as a distributed job, e.g. pai_submitter/alisa_submitter.

In this case, the error would be raised from the distributed task. And it is necessary to do more check before submitting to the cluster:

saving the user's waiting time, cluster job would pending for a long time if the cluster is busy.
reducing waste of resources, some errors can be found before submitting to the cluster.

A viable solution is DRY-RUN the generated code before submitting the generated code, which can include:

missing/unexpected model arguments diagnostic
Invalid mode arguments type diagnostic
diagnose inconsistant data type and COLUMN clause.

Yancey1989 commented 4 years ago

We have a plan to refactor the submitter module in Python, after that, we don't need to implement DRY-RUN on the current codebase, moving submitter from Go to Python can fix this problem.