rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
6.25k stars 290 forks source link

Tracking issue: Migrate from `re_arrow2` to `arrow` #3741

Open teh-cmc opened 11 months ago

teh-cmc commented 11 months ago

Currently blocked on:


Multiple end-goals:


TODO (split into sub-issues as needed):

On the way there we might hit a few bumps because we have a lot of redundant ad-hoc code that integrates with polars (which is built on top of arrow2).

The solution to this is to make sure we only integrate with polars in one single place: the Data{Cell,Row,Table} layer (https://github.com/rerun-io/rerun/issues/1692). Once that's done, we can remove all ad-hoc polars code everywhere and just build a Data{Row,Cell,Table} anytime we want a polars::Series/polars::DataFrame (#1759).

Internally, the conversion from DataTable to polars::DataFrame will require a zero-copy tri-stage conversion from arrow1->arrow2->polars.


emilk commented 2 months ago

re_arrow2 has an arrow feature, with glue for converting data between arrow and re_arrow2: https://docs.rs/re_arrow2/0.17.4/re_arrow2/array/trait.Arrow2Arrow.html

Using that we can start this migration piece-wise. It would have double the dependencies for a transitionary period, leading to longer compilation times and bigger .wasm binary, but I think that is an ok tradeoff.

Potential roadmap:

After de-chunkfification:


As of 2024-07-08, there are only around 300 lines of Rust referencing the string arrow2 directly, when one ignores generated code.

ignored paths crates/re_types/**, crates/re_types_core/src/archetypes/**, crates/re_types_core/src/datatypes/**, crates/re_types_core/src/components/**, crates/re_types_blueprint/src/blueprint/components/**, crates/re_types_blueprint/src/blueprint/archetypes/**
jleibs commented 2 months ago
teh-cmc commented 2 weeks ago

Blocked on:

teh-cmc commented 1 week ago

New blocker: