tlabs-data / tablesaw-parquet

Parquet IO for Tablesaw
Apache License 2.0
11 stars 1 forks source link

Modularization of parquet support #74

Closed larshelge closed 1 year ago

larshelge commented 2 years ago

Many thanks for this useful library. I was wondering if there are any plans to implement Parquet support for Tablesaw as a module / plugin? This would have the benefit for Tablesaw users to not have to depend on a non-core implementation of Tablesaw, to be able to upgrade immediately to new versions of Tablesaw and to allow for plugging in support for other file formats. Thank you.

ccleva commented 2 years ago

Hi @larshelge, thank you for your feedback.

Following a discussion on how to handle new file formats for Tablesaw, the decision made by the Tablesaw maintainers is that new file formats should be implemented in repositories outside of the jtablesaw org. As far as I know there is no plan to change this decision for the moment.

I fully understand the need to be able to upgrade Tablesaw as soon as a new version is released, this is why we are trying to release the corresponding tablesaw-parquet version as soon as possible. If you need to upgrade before we release, and if the new Tablesaw version does not introduce breaking changes in the core API, you should be able to continue using the same tablesaw-parquet version. For example this combination works fine:

<dependencies>
  <dependency>
    <groupId>tech.tablesaw</groupId>
    <artifactId>tablesaw-core</artifactId>
    <version>0.43.1</version>
  </dependency>
  <dependency>
    <groupId>net.tlabs-data</groupId>
    <artifactId>tablesaw_0.43.0-parquet</artifactId>
    <version>0.10.0</version>
  </dependency>
<dependencies>

Note that after the upcoming release of Tablesaw 1.0, the API is expected to remain stable so that Tablesaw and tablesaw-parquet can be upgraded independently. We will update the tablesaw-parquet artifact naming and versioning scheme to reflect this.

ccleva commented 1 year ago

Closing this as it is not planned