sql-machine-learning / sqlflow

Brings SQL and AI together.
https://sqlflow.org
Apache License 2.0
5.09k stars 698 forks source link

The third-party parser starts a Java process every time it parses a SQL program #1389

Closed wangkuiyi closed 4 years ago

wangkuiyi commented 4 years ago

https://github.com/sql-machine-learning/sqlflow/blob/5bf232b52f02f8bfc72862044dc699a51ad6a94c/pkg/sql/tpp/parser.go#L120-L125

It is expensive to start a Java process. This current approach would slow down the QPS of the SQLFlow server. We will need to change the third-party server into gRPC servers.

wangkuiyi commented 4 years ago

The following three unit tests that run the same set of cases show extremely slow parsing speed of calling Java programs.

=== RUN   TestParseWithMySQL
--- PASS: TestParseWithMySQL (0.01s)
=== RUN   TestParseWithHive
--- PASS: TestParseWithHive (44.52s)
=== RUN   TestParseWithMaxCompute
--- FAIL: TestParseWithMaxCompute (35.77s)

The parsing time cost of calling the HiveQL Java command-line parser is 445 times of that of calling the TiDB parser.

tonyyang-svail commented 4 years ago

TODO list:

Java Server

Go Client

Profiling