jtablesaw / tablesaw

Java dataframe and visualization library
https://jtablesaw.github.io/tablesaw/
Apache License 2.0
3.53k stars 641 forks source link

Circular static init in ColumnType prevents correct singleton init #1205

Open uosis opened 1 year ago

uosis commented 1 year ago

There is a circular dependency between singleton fields in ColumnType and singleton fields in its implementations. This makes correct initialization impossible.

Observe the following peculiar behavior (using Scala REPL for ease of illustration; eq in Scala is == in Java, i.e. reference equality):

> import $ivy.`tech.tablesaw:tablesaw-core:0.43.1`
> assert(tech.tablesaw.columns.numbers.DoubleColumnType.instance() eq tech.tablesaw.api.ColumnType.DOUBLE)
java.lang.AssertionError: assertion failed

This fails because static init of ColumnType is called from within static init of DoubleColumnType, while DoubleColumnType.INSTANCE field is still null, so ColumnType.DOUBLE receives a new instance from here, instead of the singleton value that eventually ends up in DoubleColumnType.INSTANCE field.

Let's try the other way:

> import $ivy.`tech.tablesaw:tablesaw-core:0.43.1`
> assert(tech.tablesaw.api.ColumnType.DOUBLE eq tech.tablesaw.columns.numbers.DoubleColumnType.instance())
// yay no error! but...
> tech.tablesaw.columns.numbers.DoubleColumnType.DEFAULT_PARSER.columnType()
res2: ColumnType = null

Now static init of DoubleColumnType is called from within static init of ColumnType, while ColumnType.DOUBLE is still null, so we get null here.

This breaks tablesaw-parquet here because it's doing reference equality comparisons.

> import $ivy.`tech.tablesaw:tablesaw-core:0.43.1`
> import $ivy.`net.tlabs-data:tablesaw_0.43.1-parquet:0.10.1`
> val table = tech.tablesaw.api.Table.create("t").addColumns(tech.tablesaw.api.DoubleColumn.create("c", 1, 2, 3))
> new net.tlabs.tablesaw.parquet.TablesawParquetWriter().write(table, net.tlabs.tablesaw.parquet.TablesawParquetWriteOptions.builder("test.parquet").build())
java.lang.NullPointerException: Cannot invoke "tech.tablesaw.api.DoubleColumn.getDouble(int)" because "this.doubleColumns[colIndex]" is null
  net.tlabs.tablesaw.parquet.TableProxy.getDouble(TableProxy.java:225)

I am not sure what is the correct solution here. Should tablesaw-parquet not assume singleton instances and use equals instead of ==? I am guessing most things are generally working because in most cases DoubleColumnType.instance() will run first, causing DoubleColumnType.instance() and ColumnType.DOUBLE to have different instances, but nothing getting null value.

Possibly similar to https://github.com/jtablesaw/tablesaw/issues/430.

ccleva commented 1 year ago

Fixed on tablesaw-parquet side with v0.11.0.

uosis commented 1 year ago

Fixed on tablesaw-parquet side with v0.11.0.

While this is great and probably the most expedient solution (thanks @ccleva for a quick fix!), the underlying problem of broken init is still there.

I am not aware of any practical issues it is causing at this point however, so this is probably not a very high priority.