rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
5.8k stars 263 forks source link

Proposal: avoid empty arrays #6602

Open jleibs opened 1 month ago

jleibs commented 1 month ago

Problem summary

There are many different ways that a query can return a logical "no value"

Depending both on the kind of missing data, and the particular query-API used, some of these will return None. Some of these will return[]. I doubt anybody on the team can tell you authoritatively which will do which.

Handling the inconsistency across all the code that does queries, especially when it comes to chaining and falling back across overrides, defaults, etc. and in the presence of possible clears pre- and post- garbage collection is a source of pain.

Proposal

I propose that we simply forbid ever allowing Some(EmptyArray()) as a result.

Anywhere that might return Some(EmptyArray()) should always return None instead.

The main rationale for this is that Rust has good support for working ergonomically with None types (map, or_else, ?, etc.) which should simplify many of the pathways where we do this handling.

More extreme proposal (Retracted: see comments)

Going a step further, I think we should try to forbid non-null zero-length-lists from the chunk data itself. Anywhere in a chunk where our outer-most ListArray may contain zero-length elements, we should have a validity map, and we should update that validity map to set the null bit for any zero-length list.

This gives us a validity-map-only fast path for always identifying whether we have "real data" for a given component in a row of a chunk.

emilk commented 1 month ago

I think this makes sense for the high-level query API: the viewer should treat null and [] the same (e.g. use default values for that component), so we should make sure the type system encodes it (with something semantically equivalent to Option<Vec1>).

However, I think the extreme proposal will cause problems:

we should try to forbid non-null zero-length-lists from the chunk data itself

Our latest-at semantics look for the first non-null value (even if it is an empty array), so null means "keep looking" and [] means "stop here and return nothing for this component".

In https://github.com/rerun-io/rerun/issues/3381 we proposed an ergonomic API for logging empty arrays. Logging pos=[a,b,c], color=null means "keep previous colors, if any" while logging pos=[a,b,c], color=[] means "use no/default colors".

emilk commented 1 month ago

If we do this, we should try to do it well, so that the returned object is similar to Vec1 in that it: