sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.09k stars 700 forks source link

Tracing error before submitting generated code to cluster #2228

Open Yancey1989 opened 4 years ago

Yancey1989 commented 4 years ago

2205 try to load estimator to diagnostic missing mode arguments error and raise SQLFlowDiagnosticError with diagnostic message, it works well when generated code running on workflow step host.

But sometimes, SQLFlow would submit the generated code to a cluster to run as a distributed job, e.g. pai_submitter/alisa_submitter.

In this case, the error would be raised from the distributed task. And it is necessary to do more check before submitting to the cluster:

  1. saving the user's waiting time, cluster job would pending for a long time if the cluster is busy.
  2. reducing waste of resources, some errors can be found before submitting to the cluster.

A viable solution is DRY-RUN the generated code before submitting the generated code, which can include:

Yancey1989 commented 4 years ago

We have a plan to refactor the submitter module in Python, after that, we don't need to implement DRY-RUN on the current codebase, moving submitter from Go to Python can fix this problem.