vesoft-inc / nebula-flink-connector

Flink Connector for Nebula Graph
48 stars 30 forks source link

Remove the Flink library and shade dependencies from the fat jar #78

Closed linhr closed 1 year ago

linhr commented 1 year ago
  1. I noticed that the Nebula Flink connector library available in the Maven central repository is large (over 100 MB). It seems that the flat jar contains the entire Flink library as dependencies. Shall we mark the Flink dependencies as <provided> in pom.xml so that it is not included in the jar? (BTW, do we have a plan to publish artifacts compatible with each recent Flink version? For example for the Kafka connector I can pick flink-sql-connector-kafka-1.16.0.jar where 1.16.0 is the Flink version I need.)
  2. It seems that third-party dependencies are not shaded in the flat jar, so there may be version conflicts with other libraries at runtime. Shall we shade all third-party dependencies under e.g. org.apache.flink.connector.nebula.shaded?

I've seen the above two issues taken care of in other official connectors (e.g. the Flink Elasticsearch connector). I have the corresponding code changes locally, and I'm happy to submit a PR if the idea makes sense.

wey-gu commented 1 year ago

Thanks so much @linhr

what do you think @Nicole00 ?

Nicole00 commented 1 year ago
  1. not included in t
  1. Yeah, the size is not unreasonable for the maven dependency, I will submit a pr to change the package way.
  2. Multi-version compatibility is in our plans, but its priority is not high. I would appreciate it if you contribute your local code.
  3. the shade action looks not so necessary. maybe we can adjust the package way and when users use the flink connector, they can exclude the internal dependencies and import another version by themselves. just like:
    <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-core</artifactId>
            <version>${flink.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>commons-lang3</artifactId>
                    <groupId>org.apache.commons</groupId>
                </exclusion>
            </exclusions>
        </dependency>

    What's your idea? @linhr

linhr commented 1 year ago

@Nicole00 Thanks for your response!

I've created a PR (#79) to illustrate my ideas. Regarding your comments:

  1. I've marked all Flink dependencies as <provided>, in this way not only the flat jar size is reduced, but also people can deploy the connector jar to their Flink environment of a different version. (Before this change, it was not possible. For example, the nebula-flink-connector-3.3.0.jar file in the Maven repository was bundled with Flink 1.14, so when I deployed it to my Flink 1.16 environment I got errors in job/task managers due to Java class conflicts.)
  2. I haven't looked into how multi-version support can be handled. I guess we need to setup unit/integration tests so that they are run against each Flink version we want to support. I'm fine if it is not of high priority at this point, since the user can build the connector from source and test against the Flink version they want.
  3. Dependency shading: There is no change to the Java code import statements, and it only affects the flat jar so developers do not need to be aware of it. I feel it is cleaner than asking the user to provide all third-party dependencies (and transitive dependencies). Especially, PyFlink and Flink SQL users typically do not have Maven in their development setup, so a self-contained flat jar would be more desirable. Dependency shading is a standard practice which can be found in other official Flink connectors as well (e.g. the Kafka connector here).
Nicole00 commented 1 year ago

@linhr Thanks for your contributions very much!

  1. I'm fine to mark dependencies as <provided> and to shade the dependencies, it has the same result with what I will do with maven-assembly-plugin.
  2. For multi-version flink support, we need to extract the common code for different flink version, and use profile's properties to control flink version. Then when deploy the snapshot or release connector, just run deploy action with parameters -pl ${connector_directory} -am -Pflink-2.14
Nicole00 commented 1 year ago

https://github.com/vesoft-inc/nebula-flink-connector/pull/79 resolved it.