2205 try to load estimator to diagnostic missing mode arguments error and raise SQLFlowDiagnosticError with diagnostic message, it works well when generated code running on workflow step host.
But sometimes, SQLFlow would submit the generated code to a cluster to run as a distributed job, e.g. pai_submitter/alisa_submitter.
In this case, the error would be raised from the distributed task. And it is necessary to do more check before submitting to the cluster:
saving the user's waiting time, cluster job would pending for a long time if the cluster is busy.
reducing waste of resources, some errors can be found before submitting to the cluster.
A viable solution is DRY-RUN the generated code before submitting the generated code, which can include:
missing/unexpected model arguments diagnostic
Invalid mode arguments type diagnostic
diagnose inconsistant data type and COLUMN clause.
We have a plan to refactor the submitter module in Python, after that, we don't need to implement DRY-RUN on the current codebase, moving submitter from Go to Python can fix this problem.
2205 try to load
estimator
to diagnostic missing mode arguments error and raiseSQLFlowDiagnosticError
with diagnostic message, it works well when generated code running on workflow step host.But sometimes, SQLFlow would submit the generated code to a cluster to run as a distributed job, e.g.
pai_submitter/alisa_submitter
.In this case, the error would be raised from the distributed task. And it is necessary to do more check before submitting to the cluster:
A viable solution is DRY-RUN the generated code before submitting the generated code, which can include: