trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.19k stars 2.94k forks source link

Add OceanBase connector #22780

Open whhe opened 1 month ago

whhe commented 1 month ago

I want to add a connector for OceanBase database based on the JDBC connector.

One issue that needs to be confirmed is that OceanBase database provides MySQL and Oracle compatible modes. On MySQL compatible mode, users can directly use the MySQL JDBC driver for connection, but the Oracle compatible mode requires to use OceanBase JDBC driver obconnector-j, which is licensed under LGPL. According to my past experience, this driver can be used with 'provided' or 'test' scope, and the project can guide users to manually import it in the document, but I'm not sure if it's OK to do so in trino community, please feel free to let me know if I'm missing anything.

wendigo commented 1 month ago

@mosabua can you chime in on the licensing ?

mosabua commented 1 month ago

With this license we can not include the jdbc driver in the binaries we ship, which means we can not include the connector in a functional manner. We have never shipped such a non-functional connector in the past. The only thing we do that is similar is on the Kafka connector where we allow users to added restricted licensed jars to enable further decoders, but the connector is perfectly functional in general.

The best approach would be if the JDBC driver could be dual licensed with a license that is compatible with the Apache license or changed to such a license completely.

Theoretically you could submit a PR for the connector that uses the MySQL JDBC driver and requires that mode, and then we add docs that you can manually swap out the jdbc driver to use the Oracle compatible mode. However I think this would be a huge mess since you would have to adjust a lot of the type mapping and other aspects on the fly. Would it be sufficient to just create connector for the MySQL mode only?

Lastly in my experience when it comes to the "compatible" modes they are always flawed.. I have yet to see a database that uses a driver from another database that is actually fully compatible and doesnt cause some issues with "edge" cases that somehow always end up creeping up. That is btw probably the reason why you need to implement an OceanBase connector in the first place and cant just use the MySQL or Oracle connectors.

mosabua commented 1 month ago

I just looked and found that the linked Oracle-compatible JDBC driver is based on the MariaDB JDBC driver. That original JDBC driver is licensed under the LGPL so the Oceanbase connector can probably NOT be relicensed legally.

https://github.com/oceanbase/obconnector-j

mosabua commented 1 month ago

I think your best option is to create and maintain the connector in a separate repository and NOT ship a binary yourself. That is legally possible. You then have to rely on your users to build the connector, add the JDBC driver, and bundle that up privately so they can then use it in their own deployment.

mosabua commented 1 month ago

Also please correct me if I messed up any details @martint

martint commented 1 month ago

My understanding is that LGPL is compatible with the Apache License when software licensed under ASL links to and distributes libraries licensed under LGPL.

For some context, according to this article from GNU.org (https://www.gnu.org/licenses/lgpl-java.html):

Applications which link to LGPL libraries need not be released under the LGPL. Applications need only follow the requirements in section 6 of the LGPL: allow new versions of the library to be linked with the application; and allow reverse engineering to debug this.

If you distribute a Java application that imports LGPL libraries, it's easy to comply with the LGPL. Your application's license needs to allow users to modify the library, and reverse engineer your code to debug these modifications. This doesn't mean you need to provide source code or any details about the internals of your application.

When you distribute the library with your application (or on its own), you need to include source code for the library. But if your application instead requires users to obtain the library on their own, you don't need to provide source code for the library.

Unlike GPL, LGPL does not require that software that links against code licensed under LGPL to be also licensed under the terms of LGPL or GPL.

What I'm not 100% clear about is whether requiring anyone distributing the library to provide access to the source code of the library is a requirement that's incompatible with ASL. I believe all we need is to include a statement in the NOTICE file along the lines of:

This software includes a library licensed under the GNU Lesser General Public License (LGPL). You can obtain a copy of the source code of this library from [URL].

whhe commented 1 month ago

Hi @mosabua, thanks for your analysis and explanation.

The fact is just as you said, the driver cannot relicensed. As for compatibility, it is true that the OceanBase database can only promise that most syntaxes are compatible, and there are still some incompatibilities, which is indeed one of the reasons why I want to add an OceanBase connector separately.

Maintaining a separate repository is acceptable to me, but this may make it hard to keep up with the community's iteration pace, so I prefer to introduce it into the main trino repository.

whhe commented 1 month ago

@martint Thank you for your explanation. It would be great if I only need to add a notice for the driver.

Because I am not familiar with the legal issues, what I'm thinking is that I develop the connector and using the driver as an optional dependency (using MySQL driver by default) and submit a pull request firstly. After the community reaches a consensus, I will commit additional adjustments as needed.

mosabua commented 1 month ago

There have been discussion along this in the legal mailing list for the ASF now and then and it seems project like Hibernate (licensed under the LGPL) are not included in Apache binaries. This might be an ASF vs an Apache license thing though. I am not 100% sure.