man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.05k stars 584 forks source link

Investigate Apache Arrow for serialization interoperability #233

Open jamesblackburn opened 8 years ago

jamesblackburn commented 8 years ago

Wes McKinney has been working on Arrow: https://arrow.apache.org/ https://github.com/apache/arrow as a DataFrame serialization and interoperability layer: https://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/ http://wesmckinney.com/blog/feather-and-apache-arrow/

Arrow has seen a fair bit of buy-in as a common data layer from the wider data science community, including interop with: Spark, Pandas, Drill, Impala and Cassandra, HBase and others on the storage side.

Due to its uptake, arrow also became an Apache Top-Level project avoiding the incubator: http://www.theregister.co.uk/2016/02/17/apache_arrow_toplevel_project/

If we make arctic Arrow-compatible it may make it easier to integrate arctic with downstream data processing systems.

lJoublanc commented 7 years ago

I'm curious, do you foresee this affecting the storage spec? Specifically,