snowflakedb / snowflake-jdbc

Snowflake JDBC Driver
Apache License 2.0
174 stars 166 forks source link

Light driver jar? #174

Closed rmannibucau closed 1 year ago

rmannibucau commented 5 years ago

Hi,

Current driver relocates:

At the end it has two important impacts for consumers:

  1. The driver is 30Mo fat (by comparison oracle driver is 4M and postgresql is less than 1M) even if most of these libraries are not used for several applications but it can't be excluded or dropped
  2. Several of the versions used by these libraries have CVE and it is almost never identified by scanning tools because of the relocation

There this issue is about proposing some alternative packaging of the driver. My proposal would be:

  1. Keep default jar aligned on maven (and not replace the pom by the reduced one without dependencies),
  2. Add a shade (current jar) with a classifier ("shaded" or "fatjar" sounds natural) to keep the compatibility and smooth experience for newcomers,
  3. (if possible) study if a driver can't be written without depenencies or just relying on standard API (javax/jakarta) and therefore avoid implementations. This enables to reduce a lot dependencies and let the user pick their preferred implementation.

Romain

smtakeda commented 5 years ago

Regarding CVE, Github detects all dependency issues and notifies us such that we fix them promptly. Also the internal monitoring tool does similar, so the security team notify as well. Can you please let me know if anything is missing?

Sorry about the size of jar but we had to include all of required dependencies to provide one jar for all solutions.

rmannibucau commented 5 years ago

@smtakeda your CVE point is only partial. Yes you can detect part of the CVE with github and fix them but then you still prevent the consumers to detect them and identify when to upgrade the driver. This is a work done by CI and fixed when needed, you can't expect projects to upgrade the driver at each release.

Concretely master has this issue:

[CVE-2016-3720] com.fasterxml.jackson.dataformat:jackson-dataformat-cbor:2.6.7 (https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2016-3720)

Also the size is stilla concern. I see multiple topics covered by the driver:

  1. let say SQL itself (statements)
  2. files
  3. aws

It makes at least 3 potential dependencies IMO, 1 being the minimal flavor.

smtakeda commented 5 years ago

Ok, filing a ticket for team review.

nboissel commented 1 year ago

Any news about this issue? We need to be able to patch CVEs independently.

Could you maybe provide a slim version in addition to the fatjar to allow both usecases?

martypitt commented 1 year ago

Another problem with the shading is that the snowflake jar breaks native compilation.

The related stackoverflow question with stack traces is available here.

After lots of trial and error, I tracked the offending jar down to the Snowflake JDBC driver. I presume you're shading in older dependencies, which are incompatible with GraaalVM.

As a result, we've had to drop snowflake support from our product. Would love to add it back, if you're able to ship a lightweight jar.

jdimeo commented 1 year ago

This is also now a bigger issue for us since our AWS Lambda has now exceeded the max of size unzipped. If we could have a unshaded Snowflake JDBC, I would be willing manage the dependency version conflicts since we already use many of the same dependencies and are paying "twice" for the space to include this JDBC driver. Can't you just publish a "use at your own risk" unshaded .jar to Maven but keep publishing the shaded .jar as the official driver?

matthenry87 commented 1 year ago

Now that I have a full understanding of what is going on with this driver (the shading), this is a huge problem from a security perspective.

You should just be including these as normal dependencies so that consumers can upgrade the versions at their own risk. You are likely hiding CVEs from of your consumers. I am going to have to bring this up to our security team.

jdimeo commented 1 year ago

Fantastic! thank you for this!

jtjeferreira commented 1 year ago

Was this really fixed in 3.13.33? The POM is still very minimal (there are only 2 dependencies) and the jar is 31M https://mvnrepository.com/artifact/net.snowflake/snowflake-jdbc/3.13.33

matthenry87 commented 1 year ago

Has the non-shaded jar been released?

jtjeferreira commented 1 year ago

Has the non-shaded jar been released?

I don't see it anywhere...

matthenry87 commented 1 year ago

@sfc-gh-wshangguan why was this closed if this hasn't been delivered?

jdimeo commented 1 year ago

I gave up, forked it, made the change: https://github.com/snowflakedb/snowflake-jdbc/commit/b0f9dbc63dbdd179b655d145c6e0857aab483c42 and published to our internal package registry...

We probably already had around 75% of the dependencies on our classpath

fgabolde commented 1 year ago

As mentioned in the related issue, the snowflake-jdbc jar is now over 60 MB (as of a Guava upgrade, as I understand?). I discovered this when a Renovate pipeline failed while trying to upgrade from 3.13.x to 3.14.2, because the uberjars we build have gone over GitLab's default artifact size limit. Of course we can raise the limit, but this is worrying; is the driver jar going to increase in size again?

These are Spring-MVC apps, so they already have quite a few dependencies, and just the Snowflake driver still contributes over 60% of their size on disk.

As a software developer, I empathize with you because shading often sounds like the easiest way to handle some issues, but also I want to tell you that you should probably not be stuffing as many dependencies into what is supposed to be a JDBC driver in the end. If some features require things outside of the scope of a JDBC driver, then consider not including them here, and provide a separate library or ecosystem for this. (A large part of the most recent increase in size seems to come from a net.snowflake.client.jdbc.internal.grpc package that used to be almost empty. Why is there so much GRPC in there?)

Like other commenters, I also have trouble with the security angle. If one of your dependencies becomes vulnerable, how is shading helping you? If your users are on top of their dependency vulnerabilities, they'll upgrade Guava or whatever along with your driver and everything is OK. If they're not, they're likely still running the old version of your JDBC driver (especially if the old version is 30MB lighter!) so they're still vulnerable.

jdimeo commented 1 year ago

In my above commit, this took two lines of code in the POM and I've been using that unshaded copy in production apps successfully. Why not offer both?

fabiencelier commented 12 months ago

One more comment to say that 60MB for a JDBC driver si too much. Please keep in mind that Snowflake is often NOT the only JDBC driver we need to include in an application. Our app for instance offer connectors to 10+ clients, so we don't want each of them to shad all the dependencies so we have 10 times the same dependencies shaded.

I also agree that shaded might appear as a simple solution to avoid conflicts but it goes against Java's philosophy of handling packages: It is very bad for package size and it is worse in term of security as users can't patch a vulnerability by themselves.

Please keep the shading as light as possible for very necessary use cases

As we have hard constraint on packages size, our company will be stuck on 3.13 until a lighter jar is released or we fork the project to build a lighter version ourselves like https://github.com/snowflakedb/snowflake-jdbc/issues/174#issuecomment-1615314543

mcouillard commented 11 months ago

Agreed, this JDBC Jar has gotten extremely large. It doubled in size from 3.13 to 3.14, but was already on the large size prior.