Closed tonyyang-svail closed 4 years ago
This is definitely needed. Each parser would be one independent jar file to the gRPC service.
Totally agree.
Also, we need to deploy the interface
to a maven repository, such as: mvnrepository.com or some private repository.
Following the comments, I updated the design as the following.
parserInterface
is used (required) by gRPC parser server.parserInterface
is realized (implemented) by each SQL dialect parser.parserInterface
is released to Maven central repository. guideparserInterface
and forms a standalone .jar
package.parserInterface
and forms a standalone .jar
package.org.sqlflow.parser.internal.odpsParser
) in the command line argument and including the parser .jar
file in the CLASSPATH
. We can also make the server takes multiple parser classpaths to deploy multiple parsers in one server.TODO list:
parserInterface
..jar
file, and enable dynamic loading.parserInterface
package to remote repository and make code base to depend on the remote package.I am following this guide to release SQLFlow Java code to Central Maven Repository:
The Java gRPC server loads the parser via dynamic loading (implementation ref). To make the loading configurable, we need to pass it as a parameter, which requires the following fields.
/opt/sqlflow/parser/parser-calcite-0.0.1-dev-jar-with-dependencies.jar
.org.sqlflow.parser.calcite.CalciteParserAdaptor
calcite
.I am proposing configuring it as a -l
flag (l
for loading). And the format is
{jar_file_path}/{dialect}/{class_name}
For example, /opt/sqlflow/parser/parser-calcite-0.0.1-dev-jar-with-dependencies.jar/calcite/org.sqlflow.parser.calcite.CalciteParserAdaptor
.
Also, we can support configuring on loading multiple .jar
files by separating each config by :
, i.e.
{config1}:{config2}:...
@typhoonzero @weiguoz @Yancey1989 Please let me know if you are happy with this format. Any suggestions are welcomed.
Well, I suggest use a more straight forward method:
-l
flag configuring the directory of jar files, like -l /opt/sqlflow/parser/adapters
, we can load all the jar
files under this directory-p
flag configuring all parser classes to load, like org.sqlflow.parser.calcite.CalciteParserAdaptor
, we can get the dialect name from the class name then.The Java gRPC server loads the parser via dynamic loading (implementation ref). To make the loading configurable, we need to pass it as a parameter, which requires the following fields.
- The jar file path, e.g.
/opt/sqlflow/parser/parser-calcite-0.0.1-dev-jar-with-dependencies.jar
.- The class name within the jar file, e.g.
org.sqlflow.parser.calcite.CalciteParserAdaptor
- The dialect name associated with the parser, e.g.
calcite
.I am proposing configuring it as a
-l
flag (l
for loading). And the format is{jar_file_path}/{dialect}/{class_name}
For example,
/opt/sqlflow/parser/parser-calcite-0.0.1-dev-jar-with-dependencies.jar/calcite/org.sqlflow.parser.calcite.CalciteParserAdaptor
.Also, we can support configuring on loading multiple
.jar
files by separating each config by:
, i.e.{config1}:{config2}:...
@typhoonzero @weiguoz @Yancey1989 Please let me know if you are happy with this format. Any suggestions are welcomed.
It looks good. Meanwhile, I think @typhoonzero 's straight forward method
is simpler.
Shall we separate the ClassLoader(..)
&loadClass(..)
from parse()
?
https://github.com/sql-machine-learning/sqlflow/blob/dbfbf655bfd63cfed44ae4b4559fe21529efb015/java/parser/src/main/java/org/sqlflow/parser/ParserGrpcServer.java#L95-L100
Shall we separate the ClassLoader(..)&loadClass(..) from parse()?
Agree. ClassLoader should be called only once during the initialization of the server.
Instead of uploading the interface to the remote, we can simply install it locally.
Problem
SQLFlow calls Java HiveQL/ODPS parser to help to parse the SQL program. As shown in the following dependency graph, each parser is wrapped as a gRPC server which takes a string and returns the parsed result.
However, some parsers like ODPS parser can't be open-sourced. This leads to the circular dependencies between the internal code and open-sourced code, i.e. the open-sourced Java gRPC server needs to call ODPS parser while the close-sourced ODPS parser needs to return
ParseResult
.Solution
We remove the dependencies from gRPC server to ODPS parser via dynamic loading. To be specific, the open-sourced code creates ODPS parser instance by the following.
By doing so, we only have a one-way dependency from internal repo to the GitHub repo.
Implementation Details
GitHub repo:
ParserInterface
.Internal repo:
.jar
file then add it to the Maven project..jar
file should be the same as the open-sourced.jar
file, since they share the same entry point.cc @typhoonzero @weiguoz