snowflakedb / snowflake-kafka-connector

Snowflake Kafka Connector (Sink Connector)
Apache License 2.0
136 stars 96 forks source link

Distribute "slim" version of the connector #792

Open enzo-cappa opened 6 months ago

enzo-cappa commented 6 months ago

The current distribution of the connector is an uber jar that has all the dependencies. However, some of those dependencies are not needed in all cases, specially in production systems. For example:

Would it be possible to distribute a slim version of the connector besides the current one? Just a JAR with fundamental dependencies. Furthermore, it would be better to distribute it as a zip/tar.gz with jar files inside, like Debezium does (see the different types and classifiers at https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.5.1.Final/). This last part would make it easier to exclude those JARs in case is needed (for example, to force a version bump in case a 0-day vuln is discovered in a dependency).

enzo-cappa commented 6 months ago

I just realized that this repo depends on the shaded snowflake-ingest, which means that there are several dependencies that are being duplicated version of https://github.com/snowflakedb/snowflake-ingest-java/tree/master?tab=readme-ov-file#jar-versions

enzo-cappa commented 6 months ago

Another finding: the JDBC driver is also distributed as a fat jar. Furthermore, both the JDBC driver and the Ingest SDK require different distributions for to be FIPS compliant, which are not used in this connector. Which make me assume that this connector is not FIPS compliant, and can not be as long as the uber/shadowed JARs are used.

sfc-gh-gjachimko commented 2 months ago

@enzo-cappa I'm very sorry for late reply. I'll add internal ticket to track this issue and discuss it. We shall se if we have some space for improvements here.

simonepm commented 1 month ago

Kudos to this, at the moment the Connector v2.3.0 supports Kafka 3.7 and Confluent 7.6.

Do not know if it is an overkill, but in order to make this version to run with Kafka 3.5 and Confluent 7.5 we had to re-build the Jar from the source code, otherwise 'NoSuchMethodError' will pop around.

It would be great to have the possibility to include just the stripped down JAR version as a dependency and include in the classpath a different version of its Kafka and Confluent dependencies for broader compatibility.

simonepm commented 1 week ago

I found a quick-and-dirty solution that does not require to re-compile the whole JAR and on the other side allows for a slim import to be used with a different set of the dependencies versions:

implementation ('com.snowflake:snowflake-kafka-connector:2.4.0') {
    transitive = false
    // exclude all original dependencies groups:
    exclude group: 'org.bouncycastle'
    exclude group: 'org.apache.kafka'
    exclude group: 'net.snowflake'
    exclude group: 'org.apache.avro'
    exclude group: 'org.apache.commons'
    exclude group: 'com.fasterxml.jackson.core'
    exclude group: 'io.confluent'
    exclude group: 'io.dropwizard.metrics'
    exclude group: 'com.google.guava'
    exclude group: 'com.google.protobuf'
    exclude group: 'dev.failsafe'
    exclude group: 'org.slf4j'
  }
  // import all original dependencies at the version required:
  implementation 'org.bouncycastle:bcpkix-fips:1.0.7'
  implementation 'org.apache.kafka:connect-api:3.5.2'
  implementation 'org.apache.kafka:kafka-clients:3.5.2'
  implementation 'net.snowflake:snowflake-jdbc:3.18.0'
  implementation 'net.snowflake:snowflake-ingest-sdk:2.2.0'
  implementation 'org.apache.avro:avro:1.11.3'
  implementation 'org.apache.commons:commons-compress:1.26.2'
  implementation 'com.fasterxml.jackson.core:jackson-core:2.17.2'
  implementation 'com.fasterxml.jackson.core:jackson-databind:2.17.2'
  implementation 'io.confluent:kafka-schema-registry-client:7.5.5'
  implementation 'io.confluent:kafka-avro-serializer:7.5.5'
  implementation 'io.confluent:kafka-connect-avro-converter:7.5.5'
  implementation 'io.confluent:kafka-schema-rules:7.5.5'
  implementation 'io.confluent:kafka-schema-registry-client-encryption:7.5.5'
  implementation 'io.dropwizard.metrics:metrics-core:4.2.26'
  implementation 'io.dropwizard.metrics:metrics-jmx:4.2.3'
  implementation 'com.google.guava:guava:32.0.1-jre'
  implementation 'com.google.protobuf:protobuf-java:3.25.4'
  implementation 'com.google.protobuf:protobuf-java-util:3.25.4'
  implementation 'dev.failsafe:failsafe:3.3.2'
  implementation 'org.slf4j:slf4j-api:1.7.36'
}

We pick the dependency imports from the original pom.xml file and change the incompatible versions (e.g. Confluent and Kafka libraries) by excluding all the dependency groups and re-importing them at the version needed.

Doing just transitive = false was not working for me, I guess because we are dealing with a fat-JAR.