Open MedAnd opened 5 years ago
I would like to second this... as apache arrow is slowly becoming mainstream.
So Arrow is definitely gaining steam, but how would an arrow integration look for Trill?
et say they convert the internal columnar format to Arrow. Trill is still going to be mutating those structures constantly since it is an incremental platform, how would an integration work with those internal structures safely and what would it do with them?
A few months on, I can answer my own questions here.
The most obvious point is to open the door to very high performance interop with other applications or runtimes. Someone could write a database plugin that uses Trill operations to compute data.
Furthermore, bulk Arrow structures could be stored directly on disk and accessed via memory mapping. Or systems can make use of standardized readers to import CSVs or Parquet files into arrow structures.
Overall, using a standardized memory layout allows Trill to be integrated more easily and more efficiently with a wider array of projects. It is a very enticing benefit.
Support for Apache Arrow which is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication.
Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.
Related issues:
5
6
7