rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
6.27k stars 294 forks source link

Split Tensor component into several archetypes #6832

Open Wumpf opened 2 months ago

Wumpf commented 2 months ago

Related to:

We generate archetypes and components for all tensor variants (TensorF32, TensorU8, etc) and make sure they share the same Visualizer:

archetype TensorU8 {
    buffer: BufferU8,

    // One of these
    shape: TensorShape,
    shape: Vec<TensorDimension>,
}

component BufferU8 {
    data: [u8],
}

archetype TensorF32 {
    buffer: BufferF32,

    // One of these
    shape: TensorShape,
    shape: Vec<TensorDimension>,
}

component BufferF32 {
    data: [f32],
}

Impact on Mesh's texture: Log an Image archetype at the same spot instead.

Detailed rationale (via @jleibs on https://github.com/rerun-io/rerun/issues/6388#issuecomment-2134003885):

Most of the choices for working with tensors fall into one of 4 categories.

Typed buffer, multiple data-types (the proposal)

Pros:

Cons:

The current hypothesis is that proliferating types is a known challenge and can be mostly automated with a mixture of code-gen and some helper code, whereas datatype conversions is an unknown challenge.

Still this puts us on a pathway where once we support multi-typed components, we mostly delete a bunch of code and everything gets simpler. Any type conversions move from visualizer-space to data-query-space, but the types and arrow representations we work with don't actually need to change.

Untyped buffer with type-id

Pros

Cons

Typed buffer with union

Pros

Cons

emilk commented 2 months ago

An alternative is to only have many Buffer components (BufferU8, BufferU16, …), but only one Tensor archetype:

archetype Tensor {
    shape: TensorShape,
    dimension_names: Option<DimensionNames>,

    // Set exactly one of these:
    buffer_u8: Option<BufferU8>,
    buffer_u16: Option<BufferU16>,
    buffer_u32: Option<BufferU32>,
    …

    color_model: Option<ColorModel>, // to interpret this tensor as an image
}

I believe this will lead to a lot less duplicated code