jtablesaw / tablesaw

Java dataframe and visualization library
https://jtablesaw.github.io/tablesaw/
Apache License 2.0
3.56k stars 645 forks source link

JTableSaw deadlocks on column initialization #1230

Open howard-3 opened 1 year ago

howard-3 commented 1 year ago

Example code to reproduce

import tech.tablesaw.api.DoubleColumn;
import tech.tablesaw.api.StringColumn;

public class Test {

  public static void main(String[] args) throws Exception {
    // uncomment the next line to prevent the initialization deadlock.
    // ColumnType.values();
    Runnable r1 = () -> {
      StringColumn.create("abc");
    };
    Runnable r2 = () -> {
      DoubleColumn.create("def");
    };
    Thread t1 = new Thread(r1);
    Thread t2 = new Thread(r2);
    System.out.println("Starting");
    t1.start();
    t2.start();
    t1.join();
    t2.join();
    System.out.println("Done");
  }

}

Constructor flow

  graph TD;
      StringColumn --> |in constructor refers to| StringColumnType
      StringColumnType --> |is an extension of| AbstractColumnType
      AbstractColumnType --> |is an impl of|ColumnType
      ColumnType --> |initializes the final values of|StringColumnType

The root cause seems to be ColumnType class referencing values of many *ColumnType classes.

When you have StringColumn and other ColumnTypes being constructed for the first time concurrently. One thread can hold the class initialization lock for StringColumn (and StringColumnType), and the other thread can hold the one for DoubleColumn and DoubleColumnType. In that scenario, StringColumn cannot finish initialization because it depends on the init lock for ColumnType which in turn relies on DoubleColumnType.

My temporary fix is simply to ensure the class for ColumnType is loaded first.

Relevant JVM tickets: https://bugs.openjdk.org/browse/JDK-8037567

I'm happy to contribute a fix, but not sure what's the best approach here? Maybe move all the initialization for the different *ColumnTypes to a new class?

lwhite1 commented 1 year ago

Tablesaw makes no claim of thread safety

On Tue, Aug 29, 2023 at 12:02 PM howard-3 @.***> wrote:

Example code to reproduce

import tech.tablesaw.api.DoubleColumn; import tech.tablesaw.api.StringColumn;

public class Test {

public static void main(String[] args) throws Exception { // uncomment the next line to prevent the initialization deadlock. // ColumnType.values(); Runnable r1 = () -> { StringColumn.create("abc"); }; Runnable r2 = () -> { DoubleColumn.create("def"); }; Thread t1 = new Thread(r1); Thread t2 = new Thread(r2); System.out.println("Starting"); t1.start(); t2.start(); t1.join(); t2.join(); System.out.println("Done"); }

}

Constructor flow

graph TD; StringColumn --> |in constructor refers to| StringColumnType StringColumnType --> |is an extension of| AbstractColumnType AbstractColumnType --> |is an impl of|ColumnType ColumnType --> |initializes the final values of|StringColumnType

The root cause seems to be ColumnType class referencing values of many *ColumnType classes.

When you have StringColumn and other ColumnTypes being constructed for the first time concurrently. One thread can hold the class initialization lock for StringColumn (and StringColumnType), and the other thread can hold the one for DoubleColumn and DoubleColumnType. In that scenario, StringColumn cannot finish initialization because it depends on the init lock for ColumnType which in turn relies on DoubleColumnType.

My temporary fix is simply to ensure the class for ColumnType is loaded first.

I'm happy to contribute a fix, but not sure what's the best approach here? Maybe move all the initialization for the different *ColumnTypes to a new class?

— Reply to this email directly, view it on GitHub https://github.com/jtablesaw/tablesaw/issues/1230, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FPAS73S4EQST4KYMLEJ3XXYHDBANCNFSM6AAAAAA4DF56FE . You are receiving this because you are subscribed to this thread.Message ID: @.***>